Methods Data generation and statistical analysis were conducted in R 3.3.3. The independent and mediating variables were generated as continuous variables from normal distributions. The outcome variable was generated as a semi-continuous count variable, using binomial and zero-truncated Poisson distributions. Under each of 18 simulation conditions varied by sample sizes and effect sizes, we generated 1000 samples and derived power and Type I error rates defined at 0.05. To fit the two-part model to the generated data, the hurdle function in the pscl package was used. As the outcome variable was semi-continuous, mediated effects were calculated in two parts: (1) model for zero vs. non-zero outcome values, i.e., binary process mediated effect, and (2) model for non-zero outcome values, i.e., count process mediated effect.
Results Simulation results showed that Type I error rates were lower than 0.05 under sample sizes below 500, for both binary and count process mediated effects, regardless of effect sizes. When the sample size reached 1000, Type I error rates were close to 0.05 for medium and large effects. Statistical power was low for the binary process mediated effect: For medium and large effects, sample size needed to have statistical power greater than 0.80 was 500 and 300, respectively. Under small effect size condition, sample size greater than 1000 was needed. For the count process mediated effects, sample size needed for power greater than 0.80 was 100 and 50 for medium and large effects, respectively. For small effect size, sample size greater than 300 was needed. Biases and relative biases were below 10% under all conditions.
Conclusion Two-part modeling approach to mediation analysis is useful for testing mediated effects in the presence of semi-continuous outcome variables, as it takes into account the distributional property of the outcome variable and provides detailed information about mediating mechanisms (binary and count process mediated effects). However, larger sample size is needed for acceptable statistical power for estimating mediated effects, especially for the binary process.