Abstract: Potential Contributions of Machine Learning to Statistical Mediation Analysis (Society for Prevention Research 27th Annual Meeting)

211 Potential Contributions of Machine Learning to Statistical Mediation Analysis

Schedule:
Wednesday, May 29, 2019
Marina Room (Hyatt Regency San Francisco)
* noted as presenting author
Oscar Gonzalez, MA, Graduate Student, Arizona State University, Tempe, AZ
Introduction: Statistical mediation analysis is a statistical procedure that identifies intermediate mechanisms, known as mediators, that transmit the effect from an independent variable to a dependent variable (MacKinnon, 2008). Statistical mediation analysis is a common approach used in prevention research to identify the critical ingredients in how interventions achieve their effects. Recent contributions from the area of causal mediation has allowed researchers to define mediated effects independent from the statistical model used to estimate those effects (VanderWeele & Vansteelandt, 2009). In other words, researchers could move from using the linear models to using more flexible models to find evidence for mediation. A candidate approach to explore relations among variables in a mediation model is to use machine learning algorithms. Machine learning is a method of data analysis focused on prediction, where algorithms can learn patterns from the data and automate model building. Machine learning algorithms specially flourish on identifying important predictors of an outcome and on identifying complex functional forms among variables. In the same vein, an assumption in estimating mediating effects is that the correct functional form among variables has been identified, and important aim of prevention research is find the most important mediators of interventions for future implementation. Therefore, machine learning algorithms hold special promise in being able to empirically model complex relations and identify important mediators instead of solely relying on theoretical justifications. The purpose of this presentation is to investigate and outline how several properties of machine learning algorithms could help us learn more about mediation processes.

Method: Simulated datasets were used to investigate the properties of mediated effect estimation with machine learning algorithms, specifically focusing on the random forest algorithm. Special cases considered were when the relation between mediators and outcomes is nonlinear and when there are multiple mediators present in the model.

Results: Based on previous research, preliminary results suggest that machine learning approaches are able to estimate the causal estimate of the mediated effect, model the complex nonlinear relations, and identify the predictors that are associated with the outcome. Theoretical and practical implications of using exploratory methods for causal reasoning are also discussed.

Conclusions: Results suggest the importance of researchers to consider incorporating machine learning algorithms to their toolbox. This paper also raises encourages researchers to consider the causal approach to statistical mediation as a more comprehensive framework to study mediated effects.