Abstract: Two-Part Modeling Approach to Mediation Analysis for Semi-Continuous Outcomes (Society for Prevention Research 26th Annual Meeting)

233 Two-Part Modeling Approach to Mediation Analysis for Semi-Continuous Outcomes

Schedule:
Wednesday, May 30, 2018
Columbia A/B (Hyatt Regency Washington, Washington, DC)
* noted as presenting author
JeeWon Cheong, PhD, Associate Professor, University of Florida, Gainesville, FL
Soyeon Jung, MS, Data Analyst, University of Florida, Gainesville, FL
Sang Don Lim, MA, Graduate Student, Sungkyunkwan University, Seoul, Korea, Republic of (South)
Introduction Many outcomes in health research are semi-continuous: A substantial number of participants have a response value of 0, with the rest having some positive response values. For example, in a national survey on alcohol use, 67% of adolescents aged 12 to 20 years reported no alcohol use in the past month, while the rest 33% drank some amount of alcohol (Kann et al., 2016). Despite such distributional properties, semi-continuous variables are often treated as continuous or dichotomized, potentially resulting in inflated Type I error rates, low statistical power, and inaccurate confidence intervals. Two-part modeling is appropriate to handle semi-continuous variables, as it allows for investigating two separate but correlated processes: one distinguishing zero from non-zero responses and the other determining the level of non-zero responses. The number of empirical studies using two-part modeling has been increasing, but it has not been applied to mediation analysis. We conducted a simulation study to examine biases, Type I error rates, and statistical power of two-part modeling approach to mediation analysis.

Methods Data generation and statistical analysis were conducted in R 3.3.3. The independent and mediating variables were generated as continuous variables from normal distributions. The outcome variable was generated as a semi-continuous count variable, using binomial and zero-truncated Poisson distributions. Under each of 18 simulation conditions varied by sample sizes and effect sizes, we generated 1000 samples and derived power and Type I error rates defined at 0.05. To fit the two-part model to the generated data, the hurdle function in the pscl package was used. As the outcome variable was semi-continuous, mediated effects were calculated in two parts: (1) model for zero vs. non-zero outcome values, i.e., binary process mediated effect, and (2) model for non-zero outcome values, i.e., count process mediated effect.

Results Simulation results showed that Type I error rates were lower than 0.05 under sample sizes below 500, for both binary and count process mediated effects, regardless of effect sizes. When the sample size reached 1000, Type I error rates were close to 0.05 for medium and large effects. Statistical power was low for the binary process mediated effect: For medium and large effects, sample size needed to have statistical power greater than 0.80 was 500 and 300, respectively. Under small effect size condition, sample size greater than 1000 was needed. For the count process mediated effects, sample size needed for power greater than 0.80 was 100 and 50 for medium and large effects, respectively. For small effect size, sample size greater than 300 was needed. Biases and relative biases were below 10% under all conditions.

Conclusion Two-part modeling approach to mediation analysis is useful for testing mediated effects in the presence of semi-continuous outcome variables, as it takes into account the distributional property of the outcome variable and provides detailed information about mediating mechanisms (binary and count process mediated effects). However, larger sample size is needed for acceptable statistical power for estimating mediated effects, especially for the binary process.