Abstract: Using Text Mining to Accelerate Scoping Reviews of Nih Funded HIV Implementation Studies (Society for Prevention Research 27th Annual Meeting)

121 Using Text Mining to Accelerate Scoping Reviews of Nih Funded HIV Implementation Studies

Wednesday, May 29, 2019
Seacliff C (Hyatt Regency San Francisco)
* noted as presenting author
Carlos Gallo, PhD, Research Assistant Professor, Northwestern University, Chicago, IL
Nanette Benbow, M.A.S., Research Assistant Professor, Northwestern University, Chicago, IL
J.D. Smith, PhD, Assistant Professor, Northwestern University, Chicago, IL
Juan Villamar, MSEd, Executive Coordinator, Center for Prevention Implementation Methodology, Northwestern University, Chicago, IL
C. Hendricks Brown, PhD, Professor, Northwestern University, Chicago, IL
Introduction: Systematic, scoping, and meta-analytic reviews are critical to advancing prevention science and for applying research to practice as they provide a comprehensive view of the state-of-the-art on a particular topic through the identification and synthesis of data. Scoping reviews are used to evaluate the research evidence for a particular research question. They are labor intensive to conduct due to the careful examination of a vast number of publications, which is currently completed most commonly by human coders. Fortunately, machine learning and automatic text mining methodologies can simplify the process of searching, summarizing, extracting, and reporting of systematic and scoping reviews. This paper describes the use of machine learning, text mining, and rule-based heuristics in the context of a scoping review to identify recently funded HIV-related interventions focused on addressing the HIV continuum and determine whether they are implementation research (IR) studies

Methods: Following PRISMA guidelines, text mining methods were used in the identification and screening phase of this scoping review. In April 2018, we conducted a scoping review of studies in NIH RePORTER that were funded between since FY 2013 that contained HIV and NIH-defined IR terms. We extracted study titles, abstracts, funding agency, and study review section; and developed a set of keyword-based heuristics to exclude studies that focused on HIV basic science. Human coders double coded studies meeting the eligibility criteria to identify and characterize HIV IR-related studies. A machine classifier based on Support Vector Machines was built and tested against human coding on whether or not studies were HIV-related and IR-related, respectively.

Findings: Of the 4,630 unique studies, a total of 848 (18%) studies were identified through text mining as meeting eligibility criteria. Human coders proceeded to double code these studies into four categories, identifying 594 studies that met the criteria of HIV-related studies focusing on HIV prevention and care; 216 (36%) of these studies were IR-related and 108 (18%) met the NIH definition of IR. The machine-classifier classified HIV and IR-related studies with an accuracy of 73% and 65%, when compared to human-based coding.

Conclusions: Scoping reviews are resource intensive. In addition, IR in HIV is rapidly changing which may result in quickly outdated reviews. The combination of machine learning and text mining used in this study can overcome these barriers by 1) identifying abstracts potentially relevant to HIV and IR that were excluded before human-based review, 2) accelerating scoping reviews, and quickly updating them on a regular basis, and 3) helping classify studies by stage of IR.