Introduction: In this presentation we outline a general framework for thinking about the dual tasks of assessing construct equivalence and achieving measurement harmonization as a foundation for conducting meta-analysis or integrative data analysis of prevention trials results. We first discuss assumptions inherent in the notion of construct equivalence and review statistical methods available for detecting or adjusting for violations of these assumptions, comparing situations where complete individual-level data are available with those where only summary statistics are available for each study. We then discuss the problem of partial measurement that occurs when different studies use different measures of the same construct, and use concepts from network theory to introduce a method for characterizing important variants of multilevel datasets in terms of the type and extent of data available for testing these assumptions.
Method: Building on the work of Curran, Bauer, and Husson, we present a strategy that combines item response theory models with scaling methods used in standard meta-analysis to make maximal use of measurement information in situations of partial measurement. We illustrate the use of this method by applying it to individual-level data combined from 12 randomized prevention trials designed to reduce risk of depression in adolescents. These 12 trials with 3102 participants include five different measures of depressive symptoms, with several trials using more than one measure.
Results: IRT analyses using a graded response model support a single factor solution in these data. Results of a number of simulations based on these data clarify how increasing sparseness of measurement imposes more stringent assumptions about measure equivalence and reduces capacity to test those assumptions.
Conclusions: IRT models can be used to create harmonized measures for use in synthesis of findings from multiple trials, but require stronger and perhaps untenable assumptions when measures overlap only weakly across trials. Efforts to develop and utilize a common measure or set of measures in future trials would facilitate better tests of construct equivalence and allow for more precise evaluation of general trial effects in data synthesis, including tests of moderation and mediation of those effects.