Methods: The multi-step linkage methodology sequentially improves the accuracy of the matches that make up the longitudinal, multi-source datasets. The methodology consists of: 1) development of self-correcting, patient-level custom linkage profiles across databases, 2) deterministic (rule-based) record linkage using exact and fuzzy text matching techniques, 3) probabilistic linkage using data mining algorithms, and 4) clerical-review record linkage.
Result:The linked, 15-year repository developed using the multi-step linkage methodology hosts multiple linked statewide data sources including birth, death, and fetal death certificates; Medicaid eligibility, encounters, and claims; hospital discharge, ambulatory, and emergency records; Healthy Start prenatal screens; Perinatal Intensive Care records; Early Intervention Program records; birth anomalies; academic performance; juvenile delinquency; and child maltreatment and foster care placement records. De-identified or limited datasets with linked records can be constructed to answer interdisciplinary research questions involving predictors and outcomes across multiple generations. These datasets are unique in that individuals are linked to their immediate family in multiple databases across years. This benefit allows researchers to examine the start of adverse conditions and then chart the trajectory of multiple risk and protective factors across the life span for individuals and their immediate family. In addition, researchers are able to identify how these factors are associated with the development of other conditions (e.g. asthma, diabetes, obesity).
Conclusion: We present a novel method for linking and cataloging data across multiple, disparate data sources. The linked data can be used to better understand how individuals and their immediate family interact within a set of structured, interconnected systems including family, peers, school, and community, as well as more macro-level influences such as the health care system and the social welfare system. The linkage methodology allows advancing knowledge on reduction of disparities in long-term outcomes and promoting health equity among individuals exposed to high-risk settings.