Abstract: Applying Machine Learning to Prevention Research (Society for Prevention Research 24th Annual Meeting)

128 Applying Machine Learning to Prevention Research

Schedule:
Wednesday, June 1, 2016
Pacific N/O (Hyatt Regency San Francisco)
* noted as presenting author
Jennifer Villani, PhD, MPH, Health Science Policy Analyst, NIH Office of Disease Prevention, Rockville, MD
Jocelyn Lee, PhD, MPH, Health Scientist Administrator, NIH Office of Disease Prevention, Rockville, MD
Ranell L. Myles, PhD, MPH, CHES, Public Health Analyst, NIH Office of Disease Prevention, Rockville, MD
Sheri Schully, PhD, Health Scientist Administrator, NIH Office of Disease Prevention, Rockville, MD
David M. Murray, PhD, Associate Director for Prevention, NIH Office of Disease Prevention, Rockville, MD
B. Ian Hutchins, PhD, Data Scientist, NIH Office of Portfolio Analysis, Bethesda, MD
Carole Christian, PhD, Health Scientist Administrator, NIH Office of Portfolio Analysis, Bethesda, MD
George Santangelo, PhD, Director, NIH Office of Portfolio Analysis, Bethesda, MD
Pam Meyer, MS, Software engineer, National Institutes of Health, Bethesda, MD
Kirk Baker, PhD, Data scientist, National Institutes of Health, Bethesda, MD
Paula Fearon, PhD, Training Director, National Institutes of Health, Bethesda, MD
Introduction:  The NIH Office of Disease Prevention (ODP) is collaborating with the NIH Office of Portfolio Analysis (OPA) to develop a machine learning framework that allows for easy execution of multiple algorithms such as Support Vector Machines or other customizable classifiers in order to construct models that can accurately identify prevention research grants. This approach utilizes semi-supervised learning methods to analyze data, recognize patterns, and classify grants according to predefined criteria.

Methods:  The ODP developed a prevention research taxonomy to classify the NIH prevention research portfolio. Using the taxonomy, the ODP identified a set of prevention exemplars (positive and negative examples) and used it to train the classifiers. The OPA applied the best performing classifier to a new set of grants in order to classify them based on input from the ODP. Subject matter experts from the ODP validated a random sample of the output (i.e., grants identified as prevention positive) to determine the sensitivity and specificity of the algorithm.

Results:  We will show preliminary results on the performance of the machine learning algorithm in identifying prevention research from the NIH grant portfolio.

Conclusions: Ultimately, this machine learning approach will facilitate the identification of patterns and trends in NIH prevention research funding, as well as research areas that may benefit from targeted investments by the NIH Institutes and Centers.