Abstract: Navigating Heavy (metal) Data: Lead Exposure and Educational Outcomes (Society for Prevention Research 27th Annual Meeting)

176 Navigating Heavy (metal) Data: Lead Exposure and Educational Outcomes

Wednesday, May 29, 2019
Seacliff B (Hyatt Regency San Francisco)
* noted as presenting author
Rashelle Musci, Ph.D., Assistant Professor, The Johns Hopkins University, Baltimore, MD
Jeffrey Grigg, PhD, Assistant Professor, The Johns Hopkins University, Baltimore, MD
Heather Volk, PhD, Associate Professor, The Johns Hopkins University, Baltimore, MD
Jana C. Goins, MHS, Epidemiologist, Baltimore City Department of Health, Baltimore, MD
Faith Connolly, PhD, Assistant Professor, The Johns Hopkins University, Baltimore, MD
Introduction. Big data - the analysis of large datasets developed through the query of surveillance and / or medical records resources- has gained increasing attention and use in academic research. Such data structures, however, have not traditionally incorporated environmental exposures with indicators of school performance. In this presentation we discuss the development of an interdisciplinary collaboration that resulted in a resource of individual-level information across the city of Baltimore. This integration will facilitate the investigation of associations between the physical and social environment and academic achievement.

Method. The project merges three administrative data sources of population-level information for Baltimore City since 2000: birth certificate data, individual level lead measurements, and public school records. Extensive care has been taken in order to harmonize the data appropriately. The matching process use a probabilistic ("fuzzy") matching algorithm on child’s first name, last name, and date of birth (Christen, 2012; Harron, Goldstein, & Dibben, 2016; Wasi & Flaaen, 2015). Data that has been harmonized includes birth record data from the Maryland Department of Health and Mental Hygiene/Baltimore City Health Department. The vital statistics include date of birth, birth weight and length, child gender, clinical estimate of gestation, race and ethnicity, maternal age, maternal education level, Medicaid eligibility, parental marital status, prenatal care, and census tract. Lead Registry Data from the Maryland Department of the Environment is available beginning in 1992 and spanning twenty-four years. Public school records from 2000 to 2018 include grade promotion, achievement test scores, disciplinary records, and other outcomes.

Results. The matching occurred in stages, with careful consideration to personally identifiable information. Prior to the matching process, the study team met with stakeholders including individuals at the city school system, the state lead commission, community leaders, and parents. After the matching was completed, an anonymous file stripped of all personally identifiable information was produced for analysis.

Discussion. This big data project has the potential to inform a number of local and national policies as we move forward to better understand long-term impacts of early life lead exposure. It also brings to light a number of challenges associated with the use of educational and health data within a big data framework. We will discuss both the challenges and potential solutions to the ethical, data security, and analytic concerns of harmonizing health and educational data.