Methods: To pilot this work, we added variable-level metadata from a specific set of alternative controlled vocabularies to selected extant data collections in the NAHDAP collection, focussing on opioid use and abuse. We developed use cases with related search terms which were used to select Common Data Elements (CDEs) from the NIH CDE Repository and ontology terms from SNOMED, PROMIS and PICO. These limited controlled vocabulary terms were added to the metadata at the variable level. Pre- and post-tests were conducted to assess improvements in discoverability of health outcome/health related variables.
Results: We have early indications that discoverability will be improved (at the time of this writing the project is underway). The enhanced metadata allows the search to find individual variables where each question is narrowly focused (e.g., participants are asked about the use of specific types of opioids, but the term ‘opioid’ was not used). Pre-testing reveals that the returns to naive searches on our extensive existing metadata tend to be overwhelming, returning hundreds of studies with potentially thousands of variables. The process of piloting this work has yielded interesting insights into the strengths and limitations associated with CDEs and ontologies (and other controlled variables), and the interaction of controlled vocabularies with search algorithm development and filtering.
Conclusions: Variable-level metadata using CDEs and mid-level concepts from curated ontologies both improves the discoverability of variables and limits the search results to a meaningful set.