Abstract: Synthesis of Prevention Programs on LGBT Youth Using Automatic Text Mining (Society for Prevention Research 24th Annual Meeting)

174 Synthesis of Prevention Programs on LGBT Youth Using Automatic Text Mining

Schedule:
Wednesday, June 1, 2016
Bayview A (Hyatt Regency San Francisco)
* noted as presenting author
Carlos Gallo, PhD, Research Assistant Professor, Northwestern University, Chicago, IL
Stacie Harissis, BS, Program Assistant, Northwestern University, Chicago, IL
Michael E. Newcomb, PhD, Assistant Professor, Northwestern University, Chicago, IL
Brian S. Mustanski, PhD, Assoc Prof and Program Director, Northwestern University, Chicago, IL
C. Hendricks Brown, PhD, Professor, Northwestern University, Chicago, IL
LGBT youth experience disproportionate rates of drug use, high-risk sexual behavior, and suicide ideation and attempts (Meyers, 2003, Paul et al., 2002). Furthermore, LGBT youth also experience substantial discrimination, bullying, and violence (Birkett, Espelage, & Koenig, 2009). While it is an important public health goal to eliminate these minorities health disparities, gaps in scientific knowledge exists in the LGBT youth prevention literature. First, due to complications in measuring sexual orientation, little is known about whether proven interventions that target the general population also benefit LGBT youth. Second, LGBT experience a wide range of bullying and violence during early puberty. Many preventive interventions start early in school years a time when youth begin to experience and express their sexual identity. It is currently unknown whether existing early preventive interventions also benefit LGBT youth. This study uses automatic text processing methods to identify trials that included LGBT population in their sample, so that data can be harmonized and effectiveness for this population tested. First, we compiled a list of measures for describing sexual attraction, identity, and behavior in order to identify trials that included such constructs in their data collection. From this list we identified 6,130 published papers of potential relevance. The text in the papers is extracted, segmented in sections (e.g., introduction, methods, results), and parsed semantically in order to obtain semantic vectors for each sentence. We identified sentences that referred to sexual debut, sexual expression, and sexual risk behavior to allow us to perform an automatic semantic sentence search. This method improves from traditional keyword matching searches by working at the sentence and semantic level. We present a set of semi-automatic tools that include pdf section extraction, semantic parsing, that help reduce the literature review search to identify relevant papers. This text processing tools not only facilitates the review process, but it makes a literature search easier to replicate.