Abstract: Latent Variable-Based Methods for Harmonizing Data across Studies: Findings in Integrative Data Analysis (Society for Prevention Research 26th Annual Meeting)

448 Latent Variable-Based Methods for Harmonizing Data across Studies: Findings in Integrative Data Analysis

Schedule:
Friday, June 1, 2018
Congressional C (Hyatt Regency Washington, Washington, DC)
* noted as presenting author
Veronica Cole, PhD, Postdoctoral Research Associate, University of North Carolina at Chapel Hill, Chapel Hill, NC
Introduction: Integrative data analysis (IDA), defined as the analysis of data pooled at the individual record level across studies, offers a number of important advantages for substance use research, including direct tests of effect reproducibility, enhanced power, the ability to examine low base-rate behaviors, and the opportunity to examine behavior change over a longer span of the life course. A key challenge for IDA, however, is the need for common measures across studies. Even when assessing the same constructs, studies often differ subtly in the measurement of core variables, raising the question of whether scale scores truly have the same meaning and metric over all studies. Drawing on factor analytic and item response theory on measurement invariance and scale equating, we have proposed the use of latent variable measurement models to quantify and account for differences in measurement across subjects. The scale scores generated from these models provide psychometrically harmonized common variables for use in IDA.

Methods: Here we present the results of several computer simulations assessing the adequacy of psychometrically harmonized scores obtained from moderated nonlinear factor analysis (MNLFA) models. Our general strategy is to (1) simulate data representing multiple samples arising from independent substance use studies which also differ from one another in measurement and in terms of relevant demographic characteristics; (2) combine these samples into one large dataset; (3) apply MNLFA to obtain scores which adjust for non-construct-related differences in measurement between persons (e.g., differences due to between-study variations in measurement); and (4) assess the accuracy of these scores.

Results: MNLFA yields scores that are highly accurate, as indexed by their correlation with population values, as long as all relevant covariates (i.e., demographic characteristics) are included in the scoring model. Additionally, the use of MNLFA scores are as predictors in subsequent regression models yields unbiased effect estimates. By contrast, bias may be severe for scores arising from latent variable methods which do not take into account between-study differences in measurement.

Conclusions: Differences between studies in measurement are critical to take into account when analyzing data which has been pooled across studies. Latent variable measurement models like MNLFA offer the opportunity to construct psychometrically harmonized scores for IDA that adjust for differences between studies due to subtle variations in measurement. These scores perform well in simulation studies but additional research is needed to confirm their advantages with real data.