Methods: We have developed the privacy preserving interactive record linkage (PPIRL) framework that allows incrementally and optimally disclosing only the needed information for record linkage in order to obtain both high quality record linkage and low risk of information privacy. The tutorial software will present users with a short self-learning module on record linkage followed by different record linkage situations. In the PPIRL framework, all the data is masked initially, and the users can make record linkage decisions aided by supplemental visual markup showing cases such as missing values, swapped first and last names, transposed characters, and data discrepancies (i.e., only the second letter is different in the name). The users can also incrementally disclose attributes of a record as needed. The PPIRL framework also has a privacy budget system that measures privacy risks of disclosing information. The budget is measured in two ways. First, we measure the percentage of characters disclosed, and then a k-anonymity based algorithm is used to measure the actual risk of being identified.
Results: Through the software, users can experience balancing between information disclosure and accuracy of results to make linkage decisions with sample data. The PPIRL demonstration will enable attendees to use the software and to provide feedback regarding the benefits and drawbacks of the framework and suggest any relevant improvements. We believe that these feedbacks are crucial for final software design.
Conclusion: Record linkage is a critical method that needs to be addressed for prevention science to leverage the power of big data siloed in different databases. The PPIRL demonstration will illustrate the key challenges and importance of addressing the current issues of linking uncoordinated databases for prevention science. Opinions, ideas and feedback will be highly encouraged during the demonstration to improve the safe management of information while still allowing for high quality record linkage.