Abstract: Missing Data Strategies for Real World Multilevel Data: What Should I Do When the Recommended Methods Fail? (Society for Prevention Research 24th Annual Meeting)

434 Missing Data Strategies for Real World Multilevel Data: What Should I Do When the Recommended Methods Fail?

Schedule:
Thursday, June 2, 2016
Pacific B/C (Hyatt Regency San Francisco)
* noted as presenting author
Stefany Coxe, PhD, Assistant Professor, Florida International University, Miami, FL
Tyler Stout, MA, Graduate Assistant, Florida International University, Miami, FL
Introduction

Multiple imputation (MI) and maximum likelihood (ML) are recommended techniques for handling missing data. Both approaches reduce or eliminate bias in estimation due to missing values. Both approaches can be implemented in common software packages such as SPSS, SAS, and Mplus. However, MI and ML can be difficult to successfully use in some real world situations involving large, complexly structured data, intricate statistical models, and/or large amounts of missingness. In these situations, more traditional approaches that are generally not recommended (e.g., single imputation expectation-maximization (EM), listwise or pairwise deletion, mean imputation) may be the only viable alternative. Given the choice between an analysis that does not work and a sub-optimal approach, how poorly does the sub-optimal approach work? A small-scale Monte Carlo simulation will be conducted using a commonly employed data structure and analysis, that of a cross-sectional multilevel (mixed) model. Multilevel models are of particular interest when observations are clustered in some way, such as children within classrooms or individuals within neighborhoods. Multilevel models are also of interest because they are relatively robust to missing values when the data are missing at random (MAR).

 Methods

 A Monte Carlo simulation will be conducted in SAS 9.4 using datasets with 30 clusters with 10 observations per cluster. Data will include a single individual-level outcome variable regressed on a single cluster-level predictor variable, a single individual-level predictor variable, and their cross-level interaction. Effect size for each predictor variables will be varied (zero, small, medium, large) as will the amount of clustering (ICC). Missingness on the individual-level predictor will be varied (1%, 5%, 15%, 25% missing values) in a missing at random (MAR) pattern.

Results

 Multilevel data will be analyzed with appropriate multilevel models, using traditional strategies for missing data handling (i.e., listwise/pairwise deletion, mean imputation) as well as newer imputation (EM and MI) and maximum-likelihood based (full-information ML) methods. These methods will be compared in terms of parameter estimate bias, type I error, confidence interval coverage, and statistical power.

 Conclusions

The results of this study will help researchers decide how to proceed when the recommended methods for missing data handling fail. The degree of bias and the impact on power and error when sub-optimal approaches are necessary can help researchers make informed decisions about their results.