Abstract: Virtual Controls: A Big Data Approach for Evaluating Disseminated Prevention Programs (Society for Prevention Research 27th Annual Meeting)

544 Virtual Controls: A Big Data Approach for Evaluating Disseminated Prevention Programs

Friday, May 31, 2019
Seacliff C (Hyatt Regency San Francisco)
* noted as presenting author
William B. Hansen, PhD, Senior Research Scientist, Prevention Strategies, Brown Summit, NC
Edward H. Ip, PhD, Professor, Wake Forest University, Winston-Salem, NC
Edward H. Saldana, MS, Biostatistician, Wake Forest University, Winston-Salem, NC
Shyh-Huei Chen, PhD, Assistant Professor, Wake Forest University, Winston-Salem, NC
Introduction: Very few alcohol, tobacco and marijuana prevention programs are evaluated once they are disseminated. Randomized Control Trials (RCTs) have long stood as the standard by which the effectiveness of a preventive intervention is judged. The essential element that defines these research studies is the assignment of groups to either receive treatment or serve as untreated controls. Under conditions of dissemination, randomizing to control group is not possible. We present an alternative strategy we have named “virtual controls”.

Methods: We integrated relevant data from eight archived longitudinal adolescent-focused datasets. After harmonizing data and imputing missing data, several psychosocial variables (intentions, normative beliefs and values) proved to be strong predictors of substance use outcomes that were included in both the integrated datasets and RCTs to be tested. We developed an algorithm that used percentiles from the integrated dataset to create age- and gender-specific latent psychosocial scores. The algorithm matched treatment case age, gender and observed psychosocial scores at pretest to create a virtual control case that was then allowed to “mature” based on age-related changes, holding the virtual case’s percentile constant. Virtual controls matched treatment case occurrence, eliminating differential attrition as a threat to validity. Virtual case substance use was estimated from the virtual case’s latent psychosocial score using logistic regression coefficients derived from analyzing the treatment group. Averaging across virtual cases created group estimates of prevalence.

Results: Treatment and control group data from two archived randomized control trials were used to test the virtual control algorithm. We set two criteria for judging the adequacy of virtual control case generation. The first was that aggregated virtual control case alcohol, cigarette and marijuana prevalence should match treatment group prevalence at pretest. The second was that patterns of onset should mimic live control changes in prevalence over time. The algorithm successfully matched pretest prevalence for both RCTs. Quality of matching was judged by calculating effect size differences between live and virtual controls. Increases in prevalence were successfully modeled; although there were discrepancies between live and virtual control outcomes.

Conclusions: Our initial attempt at using virtual control cases as a strategy for providing a means for testing program effectiveness is promising. However, additional data from archived longitudinal studies that would strengthen the algorithm are needed, particularly for estimating prevalence for high school aged adolescents.

William B. Hansen
Prevention Strategies: Employment with a For-profit organization