Methods: We hired two young men (African American and Latino) 18 years and older who live in Chicago neighborhoods with high rates of violence to work as domain experts (DE). They initially provided interpretations of 185 randomly sampled tweets from our corpus and later gave more focused insights based on our annotators’ questions and challenges. We developed a multistep process that integrates these insights to inform MSW student annotators interpretations, which provides the training data for our NLP system. Our process includes: 1) identifying, onboarding, and integrating DEs, 2) initial DE interpretations of Twitter data, 3) training and assessing student annotator quality, and 4) iterative DE involvement and reconciliation of student annotator disagreement.
Results: Incorporating broad DE insights in student annotator training and iteratively including focused DE insights throughout the annotation process led to rigorously trained student annotators and more robust understandings of social media posts by gang-involved and affiliated youth. Additionally, the involvement of DEs unearthed contextual insights specific to Chicago neighborhoods with high rates of violence. We provide seven key insights as examples: language, emojis, song lyrics, behavioral/temporal cues, people, neighborhood references, and gang/crew knowledge. We expand on these seven areas with three case examples.
Conclusions: DEs must be involved in the interpretation of unstructured data, solution creation, and other aspects of the research process. This goes beyond harvesting and capturing domain expertise. The involvement of DE in various areas of social and data science research, including mechanisms for accountability and ethically sound research practices, is a critical piece of truly creating algorithms trained to support and protect marginalized youth and communities.