Results Summary

What was the project about?

Researchers often have trouble collecting complete information on patient health, as patients may receive care at different places. Linking patient records from different places may help researchers get a more complete picture.

One way to link records is through personal information, such as names and birth dates. But this method increases risks to patient privacy. Another way, known as privacy-preserving record linkage, or PPRL, masks personal information. But current PPRL methods only work when linking entire sets of patient data, including data that have already been shared and linked. Linking entire data sets takes a long time. Also, sharing the same records multiple times increases data privacy risks.

In this study, the research team developed and tested a new PPRL method called incremental PPRL. This method links only new or updated data rather than re-linking entire data sets.

What did the research team do?

The research team developed the new PPRL method based on current methods. They then tested the new method by using it to link test data sets they created. Next, the research team used real patient data to look at how well the new method performed compared to two current record linkage methods that link entire data sets.

The real patient data the research team used included records from 2011 to 2013 from five health systems in the Colorado Congenital Heart Disease registry. The team first linked records for 4,940 patients ages 11–64. They carefully reviewed the linked records to see if they were accurate. Then the team linked the same records using the new method and the two current methods. They compared the linked records from the new and current methods with the data set they had reviewed to see how well each method worked.

Patients, a patient representative, and other researchers gave input on the study.

What were the results?

The new method performed as well as the two current methods in linking patient records. All methods accurately linked records about 97 percent of the time.

The research team made their computer program for the new method available online for free.

What were the limits of the project?

Health systems had a hard time pulling only new or updated records from their data sets to use with the new method. Also, when the team used the new method with large data sets, it was less efficient. The research team only tested the new method with data from one state and one health problem.

Future research could test the new method with patient data for other states and health problems.

How can people use the results?

Researchers can reduce privacy risks by using the new method to link new or updated records with existing data sets.

Final Research Report

View this project's final research report.

Peer-Review Summary

Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.

The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments. 

Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:

  • The reviewers asked for more information about establishing a gold standard dataset for testing their incremental privacy practice record linkage model. The researchers added a paragraph and table to their methods section describing their creation of the synthetic dataset that could be used to establish the gold standard. Responding to the reviewers’ request that the researchers provide data on the usefulness of the gold standard dataset, the researchers explained that producing such metrics was not possible because the size of such an effort would be too big to accomplish.
  • The reviewers noted the limited generalizability for the methods developed in this study and asked the researchers to provide some examples of how the methods could be used. The researchers added examples of how their data linkage methods could be used in practice.
  • Reviewers requested that the researchers revisit their definition for deterministic linkage as a method for linking two medical records. The researchers revised their definition from stating that the deterministic method requires all variables in the two records to match, to stating that it requires the set of variables used for matching to be identical. They went on to describe different examples of linkage variables that could be used in deterministic methods as well as the advantages and disadvantages of this method.

Conflict of Interest Disclosures

Project Information

Toan Ong, PhD
Michael G. Kahn, MD, PhD
Anschutz Medical Campus - University of Colorado Denver
Incremental Privacy-Preserving Record Linkage (iPPRL) to Reduce Barriers to Data Sharing and Improve Data Quality^

Key Dates

November 2018
May 2023

Study Registration Information

^This project was previously titled: Incremental Privacy-Preserving Record Linkage to Improve Data Quality


Has Results
Award Type
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: April 15, 2024