Abstract: Multiple Imputation of Missing Covariate Information in Latent Class Analysis: Evaluation of a Step-By-Step Approach (Society for Prevention Research 27th Annual Meeting)

559 Multiple Imputation of Missing Covariate Information in Latent Class Analysis: Evaluation of a Step-By-Step Approach

Friday, May 31, 2019
Regency B (Hyatt Regency San Francisco)
* noted as presenting author
John J. Dziak, PhD, Research Assistant Professor, The Pennsylvania State University, University Park, PA
Bethany Bray, PhD, Associate Research Professor, The Pennsylvania State University, University Park, PA
Introduction. Missing data are a practical problem faced by most any prevention scientist. Although most modern software packages for latent class analysis (LCA) are able to handle missing data on indicator variables using full-information maximum likelihood estimation under the missing at random assumption, most packages are unable to handle missing data on other variables included in the model, such as covariates, grouping variables and outcomes. Listwise deletion remains a common approach to dealing with missing data on these variables, but it is well-known that this approach can cause bias and statistical inefficiency. Multiple imputation is an attractive alternative to listwise deletion, but it can be difficult to implement in LCA. Not only must the imputation model be adequately specified despite a latent variable with 100% missingness by definition, but the ordering and meaning of model parameters may change from one imputed data set to another (i.e., “label switching”). This study describes, demonstrates, and evaluates a step-by-step approach to using multiple imputation with LCA.

Method. The step-by-step approach shows how to (1) create multiple imputed data sets that adequately account for the fact that the predictor of interest (i.e., latent class variable) is 100% missing, (2) fit an inclusive (i.e., one-step) LCA with covariates within each imputed data set, and (3) combine covariate effect estimates across imputed data sets that account for within- and between-data set variances. Our recommended approach is evaluated via simulation study, and is illustrated by combining PROC MI, PROC LCA and PROC MIANALYZE in SAS.

Results. Our simulation study shows that our recommended approach can provide reduced bias and error relative to listwise deletion; the approach is presented in the content of empirical data on multiple risk factors for substance use. We show that it is important to have a sufficiently rich imputation model that reflects the complex nature of a latent class variable in order to recover accurate covariate effect estimates. We also make specific recommendations about how to capitalize on the advantages of maximum likelihood estimation for missing data on indicators when using multiple imputation for covariates, in order to reduce potential issues with model convergence and identification.

Discussion. This study provides a practical approach to handling missing data on external variables when using LCA. Advantages and disadvantages of the approach are discussed, and alternatives using the Bolck, Croon and Hagenaars (2004) adjusted three-step approach to LCA with covariates are also discussed. This study provides a way for prevention scientists using LCA to handle the problem of missing data on external variables in a statistically sound way, while reducing the difficulty of combing multiple imputation with LCA.