Abstract: Potential Contributions of Machine Learning Methods to Screening Efforts in Pediatric Settings (Society for Prevention Research 25th Annual Meeting)

505 Potential Contributions of Machine Learning Methods to Screening Efforts in Pediatric Settings

Schedule:
Friday, June 2, 2017
Lexington (Hyatt Regency Washington, Washington DC)
* noted as presenting author
William Pelham, MA, Predoctoral Fellow, Arizona State University, Tempe, AZ
Introduction: There is clear need for methods that can yield rapid, reliable, and prospective predictions of whether a child or teen is at risk for various negative outcomes. One promising and largely untapped approach to this problem is machine learning (also called “statistical learning,” “data mining,” or “predictive modeling”), a class of techniques arising from statistics, computer science, and engineering that seeks to build data-driven predictive algorithms. These techniques are most noticeably distinguished from the “traditional” statistical methods typically used in psychological research (e.g., ordinary least squares regression) by their extreme emphasis on the prediction of future cases, typically sacrificing simplicity and interpretability in order to increase predictive power. Because machine learning techniques were developed with the explicit goal of prediction, they may offer advantages over traditional approaches to screening in pediatric care settings.

Methods: We use data from the Pittsburgh Youth Study to explore the potential contribution of machine learning methods to screening efforts in pediatric or school settings. Specifically, we use teacher report of child behavior during fourth or fifth grade (item parcels from the Teacher Report Form) to predict official records of arrests for violent crimes later in life. First, we train nine different predictive algorithms to optimally predict which participants will be arrested for a violent crime ([1] logistic regression, [2] lasso regression, [3] classification tree, [4] bagged classification trees, [5] boosted classification trees, [6] random forest, [7] k-nearest neighbors, [8] support vector machines, and [9] neural nets). Second, we evaluate the absolute and relative performance of these algorithms on holdout (i.e., new) data from the same sample.

Results: Results indicated that more sophisticated machine learning algorithms such as random forest and boosting outperform traditional logistic regression in predicting which children will later be arrested for a violent crime. Collapsing across all algorithms, the predictor variables identified as most important were child’s aggression, oppositionality/defiance, lack of guilt, academic achievement, and inattention. There was significant dropoff from performance in the training data to performance in the testing data, indicating the importance of evaluating a screener’s prediction in holdout data.

Conclusion: The present study suggests that machine learning methods can contribute to the identification of those individuals that will later be arrested for a violent crime. Future directions include the incorporation of larger and more varied set of predictors, the integration of multiple longitudinal datasets to increase training capacity, and consideration of how these methods might be implemented