Abstract: Abstract of Distinction: Building on the Rich Metadata from Decades of Substance Abuse Studies: The Potential for Common Data Elements (CDEs) to Enhance the Identification of Health Data across Different Research Projects (Society for Prevention Research 26th Annual Meeting)

141 Abstract of Distinction: Building on the Rich Metadata from Decades of Substance Abuse Studies: The Potential for Common Data Elements (CDEs) to Enhance the Identification of Health Data across Different Research Projects

Schedule:
Wednesday, May 30, 2018
Congressional D (Hyatt Regency Washington, Washington, DC)
* noted as presenting author
Susan Leonard, PhD, Associate Research Scientist, University of Michigan-Ann Arbor, Ann Arbor, MI
Kaye Marz, MS, Archive Manager, University of Michigan-Ann Arbor, Ann Arbor, MI
Introduction: Continued analyses of key datasets are extremely important to building understanding of the underlying causes of substance use and addiction, and multiply the benefits of our nation’s investment in this science. ICPSR and the National Addiction and HIV Data Archive Program (NAHDAP) disseminate data from hundreds of NIH-funded research studies, as well as data collected with support from other agencies and foundations, many with questions about health outcomes or status that are not easily discovered with current search protocols which can be either too narrow or too broad. With funding from NIDA, we are working to increase the use of these extant data for health research by making these variables easier to identify.This is of great benefit the research community, providing improved discoverability of relevant health concepts within and, more importantly, across the multiple studies maintained in our repositories.

Methods: To pilot this work, we added variable-level metadata from a specific set of alternative controlled vocabularies to selected extant data collections in the NAHDAP collection, focussing on opioid use and abuse. We developed use cases with related search terms which were used to select Common Data Elements (CDEs) from the NIH CDE Repository and ontology terms from SNOMED, PROMIS and PICO. These limited controlled vocabulary terms were added to the metadata at the variable level. Pre- and post-tests were conducted to assess improvements in discoverability of health outcome/health related variables.

Results: We have early indications that discoverability will be improved (at the time of this writing the project is underway). The enhanced metadata allows the search to find individual variables where each question is narrowly focused (e.g., participants are asked about the use of specific types of opioids, but the term ‘opioid’ was not used). Pre-testing reveals that the returns to naive searches on our extensive existing metadata tend to be overwhelming, returning hundreds of studies with potentially thousands of variables. The process of piloting this work has yielded interesting insights into the strengths and limitations associated with CDEs and ontologies (and other controlled variables), and the interaction of controlled vocabularies with search algorithm development and filtering.

Conclusions: Variable-level metadata using CDEs and mid-level concepts from curated ontologies both improves the discoverability of variables and limits the search results to a meaningful set.