Methods: We have developed the privacy preserving interactive record linkage (PPIRL) framework, which uses strategies to weigh tradeoffs between privacy and utility of data. The privacy objective of the PPIRL framework is to guarantee against sensitive attribute disclosure (e.g., cancer status) while minimizing identity disclosure (e.g., patient name). Meanwhile, the utility objective of the PPIRL framework is to generate the optimal matching function by allowing manual inspection of results from automatic linkage algorithms and clean and standardize messy data. This promotes both privacy and high quality linkages. We compared the quality of human decision-making in record linkage using a visual interface that controls the amount of personal information available using visual markup to highlight data discrepancies.
Results: Our study compared the quality of the record linkage decisions by the amount of characters disclosed. Results indicated that with good interface design, we could same comparable linkage decisions between the full mode, all information is fully disclosed, and moderate mode which only had 30% disclosure. We did see that as we masked more values for privacy, quality of results started to suffer (p<0.001). However, we also found that even for legally de-identified data, with proper masks it can be linked properly for most situations 0% disclosure still had 75% accuracy.
Conclusion: The results demonstrate that it is possible to greatly limit the amount of personal information available to human decision makers without negatively affecting utility or human effectiveness. Thus, incremental disclosure can significantly improve privacy protection with negligible impact on the quality of linkage.