Date of Award


Document Type

Campus Access Dissertation


Educational Studies


Educational Psychology / Research

First Advisor

Christine DiStefano


This simulation study examines the performance of fit indices commonly used by applied researchers interested in latent class clustering (LCC) or finite mixture models. The goal of LCC is to classify subjects from a large heterogeneous set of cases into homogeneous subgroups when the number of subgroups is unknown a priori. Despite the advantages of LCC over the traditional clustering, model selection aided by fit indices remains a significant challenge to researchers. Conditions for the simulation study were selected to mirror conditions found in applied educational and psychological research. The accuracy with which common fit indices identify the true LCC model is examined while varying indicator type (i.e., continuous and categorical), sample sizes, class prevalence, and class enumeration. The factors that were included for study were: number of indicators (10), metric level of indicators (two categorical / eight continuous, five categorical / five continuous, eight categorical / two continuous), sample sizes (400, 800, 1200), class prevalence (pi1 = .59, pi2 = .26, pi3 = .15; pi1 = .45, pi2 = .40, pi3 = .10; pi1 = .89, pi2 = .08, pi3 = .03 for three-class models; pi1 = .40, pi2 = .25, pi3 = .25, pi4 = .10; pi1 = .50, pi2 =.30, pi3 = .13, pi4 =.07; pi1 = .59, pi2 =.19, pi3 = .13, pi4 =.09 for four-class models), and class enumeration (three or four true underlying classes). All categorical indicators were dichotomous, and all continuous indicators were normal distributed. The fit indices examined were Akaike's Information Criterion (AIC), Bayesian Information Criterion (BIC), sample size-adjusted Bayesian Information Criterion (SSBIC), Entropy, Integrated Classification Likelihood Criterion with Bayesian-type Approximation, Lo-Mendell-Rubin likelihood ratio test, and the adjusted Lo-Mendell-Rubin likelihood ratio test. Among the three- and four-class models, the SSBIC (74.4%) and AIC (39.2%) had the highest overall accuracy rate in identifying the correct class enumeration,

respectively. Overall, SSBIC tended identify the correct solution with higher frequency than the other indices. The BIC tended to identify the correct solution with higher frequency than the other indices in models with more continuous than categorical indicators, or when rare classes were not present. When there was a small degree of separation between underlying classes, AIC tended to identify the correct solution with higher frequency than the other indices, but no index was very accurate.