Method. The step-by-step approach shows how to (1) create multiple imputed data sets that adequately account for the fact that the predictor of interest (i.e., latent class variable) is 100% missing, (2) fit an inclusive (i.e., one-step) LCA with covariates within each imputed data set, and (3) combine covariate effect estimates across imputed data sets that account for within- and between-data set variances. Our recommended approach is evaluated via simulation study, and is illustrated by combining PROC MI, PROC LCA and PROC MIANALYZE in SAS.
Results. Our simulation study shows that our recommended approach can provide reduced bias and error relative to listwise deletion; the approach is presented in the content of empirical data on multiple risk factors for substance use. We show that it is important to have a sufficiently rich imputation model that reflects the complex nature of a latent class variable in order to recover accurate covariate effect estimates. We also make specific recommendations about how to capitalize on the advantages of maximum likelihood estimation for missing data on indicators when using multiple imputation for covariates, in order to reduce potential issues with model convergence and identification.
Discussion. This study provides a practical approach to handling missing data on external variables when using LCA. Advantages and disadvantages of the approach are discussed, and alternatives using the Bolck, Croon and Hagenaars (2004) adjusted three-step approach to LCA with covariates are also discussed. This study provides a way for prevention scientists using LCA to handle the problem of missing data on external variables in a statistically sound way, while reducing the difficulty of combing multiple imputation with LCA.