Abstract: Propensity Score Analysis with a Latent or Mis-Measured Covariate: Using Factor Scores from Inclusive Factor or Structural Equation Models to Reduce Bias (Society for Prevention Research 24th Annual Meeting)

176 Propensity Score Analysis with a Latent or Mis-Measured Covariate: Using Factor Scores from Inclusive Factor or Structural Equation Models to Reduce Bias

Schedule:
Wednesday, June 1, 2016
Pacific M (Hyatt Regency San Francisco)
* noted as presenting author
Trang Quynh Nguyen, PhD, Postdoctoral Research Fellow, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
Hwanhee Hong, PhD, Postdoctoral Fellow, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
Elizabeth A. Stuart, PhD, Professor, John Hopkins Bloomberg School of Public Health, Baltimore, MD
Introduction: Propensity score methods assume that covariates used to model treatment assignment are measured without error. This is problematic when a covariate X is measured with error or is a latent variable. For example, to estimate the effect of high school substance use on young adulthood depression, we may use propensity score methods to balance pre-exposure covariates; these, however, include depressive symptomatology, a latent X. It has been suggested that if multiple measurements Ws of X are available, rather than using Ws (or their mean/sum) in propensity score analysis, it is better to use Ws in a latent factor model, estimate a factor score (FS) and use this FS to represent X. Evaluation of this approach to date is limited.

Methods: In the context of propensity score weighting, we investigate this method (the simple FS) and two other types of FSs generated from factor or structural equation models (FA/SEM) that also include the treatment variable T (T-inclusive FSs) and other covariates Z from the propensity score model (TZ-inclusive FSs). We consider logit, probit and identity links for T. We address non-differential measurement error with respect to T, Z, and the outcome.

Results: We focus on the case where the measurement errors are independent of one another. The simple FS and the direct Ws method result in similar bias; bias is higher with the mean W method if Ws’ correlations with X are not uniform. At extreme T prevalence (near 0 or 1), the simple FS performs better than the direct Ws method in terms of variance. Relative to the simple FS, T-inclusive FSs substantially reduce bias, and when X and Z are uncorrelated, brings bias to near zero. When X is correlated with Z, this approach is also biased (but to a lesser degree), due to incompatibility between the FS model and the propensity score model with respect to inclusion/exclusion of Z. Such bias is essentially eliminated by TZ-inclusive FSs generated from models that are saturated with respect to the X-T-Z joint distribution in model fitting and in FS computation, including: (1) the linear FA model with Ws, T and Z as indicators and a residual T-Z correlation; (2) the SEM based on the true model (Ws reflecting unobserved X, and Z and  X correlated and influencing T) with logit/probit/identity link for T, fit using ML; and (3) a modified SEM with probit link fit using WLS.

When some measurement errors are correlated, all FSs’ performance is worsened if the FS model does not capture such correlations.

Conclusions: We recommend using one of these TZ-inclusive FSs to represent the mis-measured/latent X in propensity score analysis. We also recommend careful factor analysis of Ws to identify residual correlations needed in the FS model. For illustration, the method is applied to the above-mentioned example, using Add Health data.