Schedule:
Thursday, June 2, 2016
Grand Ballroom B (Hyatt Regency San Francisco)
* noted as presenting author
Prevention scientists have begun exploring new methods for synthesizing data across multiple studies to test moderation and mediation of preventive program effect. Synthesis methods such as integrated data analysis (IDA) that combine individual-level datasets have many advantages over traditional meta-analysis, but also bring new challenges. When different trials use different measures for the same construct, questions arise concerning construct equivalence and harmonization across measures, as well as questions of whether measurement methods are invariant across population characteristics such as SES or ethnicity, a key issue when testing for disparities in preventive effect. Quantitative methods for studying measure equivalence have been developed for item-level and score-level data; in this study we extended these methods to study construct equivalence in more complex datasets involving multiple measures. We used a dataset combining item-level and individual-level data from 5210 adolescents in 19 prevention trials designed to reduce risk for adolescent depression. Collectively these trials employed 7 different measures of depression based on adolescent, parent, or clinician report. We tested 4 different approaches to harmonization of these depression measures. Using item-level data we specified a single factor item response theory (IRT) model, and compared it to a bifactor IRT model that included three secondary factors for the three reporters. Using score-level data we specified a confirmatory factor analysis (CFA) model which included trial-level covariates to account for cross-trial differences. We also created two observed-score models: the first standardized scores within measure using variances based on all available data across trials, given that several of the measures were used in more than one trial. The second standardized scores within trial, analogous to effect-size estimates used in standard meta-analysis. We then tested whether our measurement models were invariant across SES and ethnicity. Findings indicated that depression scores were strongly influenced by reporter method variance. Once this variance was taken into account, factor score estimates differed sharply from standardized observed scores. At least for this dataset, clinician ratings were associated only very weakly with the common factor, suggesting that construct equivalence was not supported for those methods. We discuss the implications of these findings for both standard meta-analysis and IDA, including the importance of testing for measurement and construct invariance when individual- or item-level data are available, and the importance of invariance when testing whether prevention programs have equitable impact across SES or ethnicity.