Abstract: Abstract of Distinction: Artificial Intelligence and Inclusion: Formerly Gang-Involved Youth As Domain Experts for Analyzing Unstructured Twitter Data (Society for Prevention Research 27th Annual Meeting)

622 Abstract of Distinction: Artificial Intelligence and Inclusion: Formerly Gang-Involved Youth As Domain Experts for Analyzing Unstructured Twitter Data

Friday, May 31, 2019
Garden Room B (Hyatt Regency San Francisco)
* noted as presenting author
William R Frey, MSW, Doctoral Student, Columbia University, New York, NY
Desmond Patton, PhD, Associate Professor, Columbia University, New York, NY
Michael Gaskell, PhD, Postdoc, Columbia University, New York, NY
Kyle McGregor, PhD, Assistant Professor, New York University, New York, NY
Introduction: When analyzing social media data from marginalized communities, algorithms lack the ability to accurately interpret offline context, which may lead to dangerous assumptions about and implications for marginalized communities. To combat this challenge, we hired formerly gang-involved youth as domain experts for contextualizing social media data to create inclusive, community-informed algorithms. Utilizing data from the Gang Intervention and Computer Science Project, we describe the process of involving formerly gang-involved youth in developing a new prototype natural language processing (NLP) system that detects aggression and loss in Twitter data. We offer a contextually-driven interdisciplinary approach between social work and data science that integrates domain insights into the training of social work annotators and the production of algorithms for positive social impact.

Methods: We hired two young men (African American and Latino) 18 years and older who live in Chicago neighborhoods with high rates of violence to work as domain experts (DE). They initially provided interpretations of 185 randomly sampled tweets from our corpus and later gave more focused insights based on our annotators’ questions and challenges. We developed a multistep process that integrates these insights to inform MSW student annotators interpretations, which provides the training data for our NLP system. Our process includes: 1) identifying, onboarding, and integrating DEs, 2) initial DE interpretations of Twitter data, 3) training and assessing student annotator quality, and 4) iterative DE involvement and reconciliation of student annotator disagreement.

Results: Incorporating broad DE insights in student annotator training and iteratively including focused DE insights throughout the annotation process led to rigorously trained student annotators and more robust understandings of social media posts by gang-involved and affiliated youth. Additionally, the involvement of DEs unearthed contextual insights specific to Chicago neighborhoods with high rates of violence. We provide seven key insights as examples: language, emojis, song lyrics, behavioral/temporal cues, people, neighborhood references, and gang/crew knowledge. We expand on these seven areas with three case examples.

Conclusions: DEs must be involved in the interpretation of unstructured data, solution creation, and other aspects of the research process. This goes beyond harvesting and capturing domain expertise. The involvement of DE in various areas of social and data science research, including mechanisms for accountability and ethically sound research practices, is a critical piece of truly creating algorithms trained to support and protect marginalized youth and communities.