A Bayesian nonparametric approach for the analysis of multiple categorical item responses


We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogeneous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.

Journal article

Waters, Andrew, Fronczyk, Kassandra, Guindani, Michele, et al.. "A Bayesian nonparametric approach for the analysis of multiple categorical item responses." Journal of Statistical Planning and Inference, 166, (2015) Elsevier: 52-66. https://doi.org/10.1016/j.jspi.2014.07.004.

Has part(s)
Forms part of
This is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Elsevier.
Link to license
Citable link to this page