TY - JOUR
T1 - Beyond the number of classes
T2 - separating substantive from non-substantive dependence in latent class analysis
AU - Oberski, D. L.
N1 - Funding Information:
Thanks are due to the participants of the 2013 meeting of the Italian Statistical Society (SIS) in Brescia for comments and suggestions. The author was supported by the Netherlands Organization for Scientific Research (NWO) [Veni grant number 451-14-017].
Funding Information:
Thanks are due to the participants of the 2013 meeting of the Italian Statistical Society (SIS) in Brescia for comments and suggestions. The author was supported by the Netherlands Organization for Scientific Research (NWO) [Veni grant number 451-14-017].
Publisher Copyright:
© 2015, The Author(s).
PY - 2016/6
Y1 - 2016/6
N2 - Latent class analysis (LCA) for categorical data is a model-based clustering and classification technique applied in a wide range of fields including the social sciences, machine learning, psychiatry, public health, and epidemiology. Its central assumption is conditional independence of the indicators given the latent class, i.e. “local independence”; violations can appear as model misfit, often leading LCA practitioners to increase the number of classes. However, when not all of the local dependence is of substantive scientific interest this leads to two options, that are both problematic: modeling uninterpretable classes, or retaining a lower number of substantive classes but incurring bias in the final results and classifications of interest due to remaining assumption violations. This paper suggests an alternative procedure, applicable in cases when the number of substantive classes is known in advance, or when substantive interest is otherwise well-defined. I suggest, in such cases, to model substantive local dependencies as additional discrete latent variables, while absorbing nuisance dependencies in additional parameters. An example application to the estimation of misclassification and turnover rates of the decision to vote in elections of 9510 Dutch residents demonstrates the advantages of this procedure relative to increasing the number of classes.
AB - Latent class analysis (LCA) for categorical data is a model-based clustering and classification technique applied in a wide range of fields including the social sciences, machine learning, psychiatry, public health, and epidemiology. Its central assumption is conditional independence of the indicators given the latent class, i.e. “local independence”; violations can appear as model misfit, often leading LCA practitioners to increase the number of classes. However, when not all of the local dependence is of substantive scientific interest this leads to two options, that are both problematic: modeling uninterpretable classes, or retaining a lower number of substantive classes but incurring bias in the final results and classifications of interest due to remaining assumption violations. This paper suggests an alternative procedure, applicable in cases when the number of substantive classes is known in advance, or when substantive interest is otherwise well-defined. I suggest, in such cases, to model substantive local dependencies as additional discrete latent variables, while absorbing nuisance dependencies in additional parameters. An example application to the estimation of misclassification and turnover rates of the decision to vote in elections of 9510 Dutch residents demonstrates the advantages of this procedure relative to increasing the number of classes.
KW - Bivariate residual
KW - Information criteria
KW - Latent class analysis
KW - Local dependence
KW - Score test
KW - Vote misclassification
UR - http://www.scopus.com/inward/record.url?scp=84933054829&partnerID=8YFLogxK
U2 - 10.1007/s11634-015-0211-0
DO - 10.1007/s11634-015-0211-0
M3 - Article
AN - SCOPUS:84933054829
SN - 1862-5347
VL - 10
SP - 171
EP - 182
JO - Advances in Data Analysis and Classification
JF - Advances in Data Analysis and Classification
IS - 2
ER -