Methods: We integrated relevant data from eight archived longitudinal adolescent-focused datasets. After harmonizing data and imputing missing data, several psychosocial variables (intentions, normative beliefs and values) proved to be strong predictors of substance use outcomes that were included in both the integrated datasets and RCTs to be tested. We developed an algorithm that used percentiles from the integrated dataset to create age- and gender-specific latent psychosocial scores. The algorithm matched treatment case age, gender and observed psychosocial scores at pretest to create a virtual control case that was then allowed to “mature” based on age-related changes, holding the virtual case’s percentile constant. Virtual controls matched treatment case occurrence, eliminating differential attrition as a threat to validity. Virtual case substance use was estimated from the virtual case’s latent psychosocial score using logistic regression coefficients derived from analyzing the treatment group. Averaging across virtual cases created group estimates of prevalence.
Results: Treatment and control group data from two archived randomized control trials were used to test the virtual control algorithm. We set two criteria for judging the adequacy of virtual control case generation. The first was that aggregated virtual control case alcohol, cigarette and marijuana prevalence should match treatment group prevalence at pretest. The second was that patterns of onset should mimic live control changes in prevalence over time. The algorithm successfully matched pretest prevalence for both RCTs. Quality of matching was judged by calculating effect size differences between live and virtual controls. Increases in prevalence were successfully modeled; although there were discrepancies between live and virtual control outcomes.
Conclusions: Our initial attempt at using virtual control cases as a strategy for providing a means for testing program effectiveness is promising. However, additional data from archived longitudinal studies that would strengthen the algorithm are needed, particularly for estimating prevalence for high school aged adolescents.