Abstract: TECH DEMO: Understanding Information Privacy and Record Linkage (Society for Prevention Research 26th Annual Meeting)

82 TECH DEMO: Understanding Information Privacy and Record Linkage

Schedule:
Tuesday, May 29, 2018
Columbia A/B (Hyatt Regency Washington, Washington, DC)
* noted as presenting author
Theodoros Giannouchos, MSc, PhD student, Graduate Research Assistant, Texas A&M University, Bryan, TX
Qinbo Li, MS, Graduate Research Assistant, Texas A&M University, College Station, TX
Yumei Li, High School, Student, Texas A&M University, College Station, TX
Gurudev Ilangovan, BA, Student, Texas A&M University, College Station, TX
Eric Ragan, PhD, Assistant Professor, Texas A&M University, College Station, TX
Hye-Chung Kum, PhD, Associate Professor, Texas A&M University, College Station, TX
Introduction: In the digital era, there are many new sources of data available for prevention science (e.g., government administrative data, EHR). The lack of a common identifier across various data sources makes integrating data difficult but necessary to conduct research. Record linkage is the process of identifying the same person in such diverse datasets without a common identifier. When integrating person level data, the need for human interaction with identifying information to obtain accurate linkages and concerns of information privacy poses a challenge. This technology demonstration is an online tutorial that allows users to understand the tradeoffs between information privacy and utility of data during record linkage through direct experience.

Methods: We have developed the privacy preserving interactive record linkage (PPIRL) framework that allows incrementally and optimally disclosing only the needed information for record linkage in order to obtain both high quality record linkage and low risk of information privacy. The tutorial software will present users with a short self-learning module on record linkage followed by different record linkage situations. In the PPIRL framework, all the data is masked initially, and the users can make record linkage decisions aided by supplemental visual markup showing cases such as missing values, swapped first and last names, transposed characters, and data discrepancies (i.e., only the second letter is different in the name). The users can also incrementally disclose attributes of a record as needed. The PPIRL framework also has a privacy budget system that measures privacy risks of disclosing information. The budget is measured in two ways. First, we measure the percentage of characters disclosed, and then a k-anonymity based algorithm is used to measure the actual risk of being identified.

Results: Through the software, users can experience balancing between information disclosure and accuracy of results to make linkage decisions with sample data. The PPIRL demonstration will enable attendees to use the software and to provide feedback regarding the benefits and drawbacks of the framework and suggest any relevant improvements. We believe that these feedbacks are crucial for final software design.

Conclusion: Record linkage is a critical method that needs to be addressed for prevention science to leverage the power of big data siloed in different databases. The PPIRL demonstration will illustrate the key challenges and importance of addressing the current issues of linking uncoordinated databases for prevention science. Opinions, ideas and feedback will be highly encouraged during the demonstration to improve the safe management of information while still allowing for high quality record linkage.