Abstract: Analyzing Skewed Data with Many Zeros in Youth Prevention Research (Society for Prevention Research 26th Annual Meeting)

234 Analyzing Skewed Data with Many Zeros in Youth Prevention Research

Schedule:
Wednesday, May 30, 2018
Columbia A/B (Hyatt Regency Washington, Washington, DC)
* noted as presenting author
Aaron Boulton, PhD, Senior Biostatistican, University of Delaware, Newark, DE
Anne Williford, PhD, Associate Professor, Colorado State University, Fort Collins, CO
Jeff Jenson, PhD, Professor, University of Denver, Denver, CO
Purpose: Prevention scientists often analyze outcomes that contain severe positive skewness and many zero observations. Variables with these features can arise when researchers measure low base-rate behaviors (e.g., bullying, substance use, school dropout) or use measurement tools that cannot differentiate individuals with low standing on the construct being assessed (floor effects). The purpose of this presentation is to describe the issues that arise when dealing with severely skewed outcomes and review alternative methods that may be more suitable for analyses.

Method: The meaning of the zero values in skewed data is an important consideration when choosing an analytic approach. Two common causes of abundant zeros - censoring due to measurement limitations and the true absence of the construct of interest - are discussed. Problems that arise with the application of standard statistical methods (e.g., linear regression) to skewed, zero-heavy data are highlighted, along with common approaches used to address such data and their limitations. Promising but less well-known analytic alternatives are reviewed, including the Tobit or censored regression model and the two-part model. These techniques are illustrated with an example analysis using data from a study of bullying prevention in urban youth.

Results: Examining whether participation in the intervention resulted in lower levels of bully victimization at the end of program implementation in the spring of 5th grade was the question addressed by each technique. According to the analyses, two approaches, the Tobit model for censored variables and the two-part model for outcomes with true zeros, were particularly appropriate for use with severely skewed data.

Conclusions: For data that are censored at the floor of a scale, the Tobit model is an established, effective technique that continues to be actively researched. The Tobit model’s latent variable formulation is also intuitively appealing for a field, like prevention science, that is well-versed in the theory and methods of latent variable measurement. For semicontinuous outcomes containing true zeros, the two-part model is an apt choice as it allows researchers considerable flexibility in modeling mixtures of zeros and non-zeros directly. Greater awareness of the issues that arise with and the proper use of methods designed for skewed data with many zeros can help elevate the work of prevention scientists who routinely deal with such outcomes. The methods reviewed here will provide prevention researchers with an augmented set of analytic tools that, if appropriate and properly applied, can boost progress in unleashing the power of prevention for today’s youth.