Methods: Following PRISMA guidelines, text mining methods were used in the identification and screening phase of this scoping review. In April 2018, we conducted a scoping review of studies in NIH RePORTER that were funded between since FY 2013 that contained HIV and NIH-defined IR terms. We extracted study titles, abstracts, funding agency, and study review section; and developed a set of keyword-based heuristics to exclude studies that focused on HIV basic science. Human coders double coded studies meeting the eligibility criteria to identify and characterize HIV IR-related studies. A machine classifier based on Support Vector Machines was built and tested against human coding on whether or not studies were HIV-related and IR-related, respectively.
Findings: Of the 4,630 unique studies, a total of 848 (18%) studies were identified through text mining as meeting eligibility criteria. Human coders proceeded to double code these studies into four categories, identifying 594 studies that met the criteria of HIV-related studies focusing on HIV prevention and care; 216 (36%) of these studies were IR-related and 108 (18%) met the NIH definition of IR. The machine-classifier classified HIV and IR-related studies with an accuracy of 73% and 65%, when compared to human-based coding.
Conclusions: Scoping reviews are resource intensive. In addition, IR in HIV is rapidly changing which may result in quickly outdated reviews. The combination of machine learning and text mining used in this study can overcome these barriers by 1) identifying abstracts potentially relevant to HIV and IR that were excluded before human-based review, 2) accelerating scoping reviews, and quickly updating them on a regular basis, and 3) helping classify studies by stage of IR.