Patient health data are often scattered and incomplete. This fragmentation can be overcome by linking data from multiple sources, such as disease registries and administrative healthcare data. Record linkage (RL), also called entity resolution, is a family of technical methods that identify data in separate datasets that belong to the same individual. Accurate RL is essential for clinical quality and safety and has important applications for clinical and observational research. Linking patient records present in one dataset (Dataset B) to patient records in another dataset (Dataset A) may lead to (1) additional values for data elements already present in Dataset A, (2) expanding Dataset A with new data elements not currently present in the dataset, or (3) updating data elements already present in Dataset A with more recent or accurate values.
We will address gaps in RL processes, including (1) new methods to determine if two datasets can be linked together, (2) new methods to link new data to existing linked data without relinking all records, and (3) new methods to evaluate the impact of RL on data quality. In current practice, linkage accuracy can be measured only after records from two datasets are linked. We aim to predict linkage accuracy prior to the linkage process by assessing data features of the linkage variables. Full linkage (relinking both old and new records) is often extremely inefficient in large networks. With full linkage, each update of the research network with new patient data, no matter how minor, requires full datasets to be pulled, processed, transferred, and linked, thus repeating processes on datasets for which most of the patients have already been linked. Our incremental privacy-preserving record linkage (PPRL) method will efficiently link new data (i.e., incremental data) to old data without requiring human-readable data to be shared. The accuracy of our method will be manually validated by human reviewers. We will measure and compare the quality of pre- and post-linkage data to quantify the impact of RL on data quality.
Effective and meaningful communication with patients and stakeholders is important to the success of any methodology development project because they are the beneficiaries and users of the method being developed. However, we are aware of the technical nature of RL methods, especially PPRL methods. Therefore, the technical team will enlist our patient team member to draft materials (e.g. language, visuals) to communicate with patients effectively. We will engage patients and stakeholders throughout the study design, conduct, and dissemination to enhance the potential for dissemination and adoption of our method, and to realize the patient-centered outcomes research opportunities. Methods to link data securely will indirectly benefit patients by (1) improving the quality of health data available to research and (2) protecting the security and privacy of patient data in the linkage process.