Results Summary
What was the project about?
Researchers often combine patient health data from different sources, such as claims and health records, to get a fuller picture of patients’ health. To combine these data, researchers use personal information, or PI, such as names and social security numbers.
Using PI to link data can put patient privacy at risk. Researchers can use computer software that hides PI to protect privacy, but they must make decisions about how much PI to hide. For example, they must decide how much PI to look at or how many records to review to make sure data are linked accurately.
In this project, the research team created and tested a new user interface called MiNDFIRL. MiNDFIRL can be used with record linkage, or RL, software to help researchers use less PI while maintaining accuracy.
What did the research team do?
The research team got input from RL experts and patients to design features of MiNDFIRL. The team then led two large studies with data analysts:
- In study 1, the team looked at how hiding different amounts of PI affected analysts’ ability to accurately link data.
- In study 2, the team tested ways to use less PI. For example, one way was to hide all PI but let analysts click to see only important parts of the PI. Another way had an image of a meter that showed how privacy risk increases when analysts use more PI to link data sets.
The research team also tested MiNDFIRL in two case studies with 12 data analysts at two medical schools. The case studies checked how much PI was needed to accurately link real patient health data. The team also tested if using MiNDFIRL increased the amount of time analysts took to link data.
What were the results?
The studies showed that hiding more PI made it harder to accurately link data. They also showed which features helped data analysts accurately link the data using the least amount of PI. Adding these features reduced the use of PI from 100 percent to 8 percent with the same level of accuracy. In the two case studies, MiNDFIRL helped analysts link real data using only 30 percent of PI. Using MiNDFIRL didn’t increase the time analysts took to link data.
What were the limits of the project?
MiNDFIRL can only be used with RL software. Future studies could add software so that MiNDFIRL can be used by itself to link data.
How can people use the results?
Researchers can consider using MiNDFIRL with RL software to help researchers accurately link patient data while protecting patient privacy.
How this project fits under PCORI’s Research Priorities The research reported in this results summary was conducted using PCORnet®, the National Patient-Centered Clinical Research Network. PCORnet® is intended to improve the nation’s capacity to conduct health research, particularly comparative effectiveness research (CER), efficiently by creating a large, highly representative network for conducting clinical outcomes research. PCORnet® has been developed with funding from the Patient-Centered Outcomes Research Institute® (PCORI®). |
Professional Abstract
Background
Record linkage (RL)—linking a patient’s records across multiple databases such as electronic health records (EHRs) and insurance claims—can help improve comparative effectiveness research by providing a more complete picture of a patient’s health. But accurately linking patient records requires using personally identifiable information, such as names and social security numbers, which raises concerns about patient privacy. Statistical methods can mask personally identifiable patient data, but use of these data can decrease RL accuracy. Researchers also face challenges in using RL, including the need to manually review complex linkages, refine linkage models, and verify RL accuracy.
To facilitate effective use of RL, researchers can benefit from software support that can balance accuracy and patient privacy. Software with effective user interfaces can also help researchers identify choices that increase privacy risks.
Objective
To design effective user interfaces that can support human decision making in RL to facilitate accurate linking of patient records while protecting patient privacy
Study Design
Design Element | Description |
---|---|
Design | Controlled experiments; case studies; surveys, and focus groups |
Data Sources and Data Sets |
Formative studies:
Summative evaluation: Case studies by RL experts with EHR data (N=10,000 total pairs with 303 manually reviewed pairs) and patient-generated data (N=1,055 total pairs with 187 manually reviewed pairs) |
Analytic Approach |
|
Outcomes |
|
Methods
The research team developed an open-source user interface prototype software called MiNDFIRL, or Minimum Necessary Disclosure For Interactive Record Linkage, through iterative processes of design, development, and evaluation. The team obtained input for different software features from multiple studies conducted with diverse stakeholders, including RL experts, researchers who use RL, clinicians, privacy experts, and patients, as well as from two case studies with data analysts at medical schools.
The team also conducted two large-scale controlled experiments that examined how different designs affect data analysts’ ability to manually link records accurately.
- Study 1 examined the effect of varying levels of masking of personally identifiable information on data analysts’ ability to manually link records accurately.
- Study 2 examined the effect of a clickable interactive interface to promote ethical behavior in RL decision making. For example, the interface displayed privacy risks in different ways, such as a meter that showed the potential increase in risk when an additional, personally identifiable data field was disclosed
Results
The two controlled experiments demonstrated that MiNDFIRL can substantially limit the amount of personally identifiable information during manual review (7.85% compared with 100% disclosure) without negatively affecting the quality of RL or completion time. Results from the expert review showed that allowing analysts some access to personally identifiable information yielded better RL accuracy compared with no access, and better patient privacy compared with access to all personally identifiable information. Both case studies showed that allowing analysts access to 30% of the personally identifiable information was sufficient to support their decision making in RL.
Limitations
MiNDFIRL is a prototype software and needs to be fully developed for use in research. The research team did not investigate automated RL algorithms.
Conclusions and Relevance
MiNDFIRL can help protect patient privacy by assisting researchers in using the minimum amount of information necessary to accurately link records during human decision making in RL.
Future Research Needs
Future research could focus on incorporating automated RL algorithms to fully develop MiNDFIRL into a comprehensive RL system.
How this project fits under PCORI’s Research Priorities The research reported in this results summary was conducted using PCORnet®, the National Patient-Centered Clinical Research Network. PCORnet® is intended to improve the nation’s capacity to conduct health research, particularly comparative effectiveness research (CER), efficiently by creating a large, highly representative network for conducting clinical outcomes research. PCORnet® has been developed with funding from the Patient-Centered Outcomes Research Institute® (PCORI®). |
Final Research Report
View this project's final research report.
Journal Citations
Related Journal Citations
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers noted that the study trial was not described in detail. They stated that the researchers needed to provide more information about the study participants, allocation to groups, and randomization. The researchers clarified that this work was not a randomized clinical trial; this was a test of usability for the new software, and thus the report included sufficient data on participants and provided the level of detail typical to software development research. However, the researchers did provide more background information on study participants and explained that the study was not fully randomized, but participants were allocated to conditions following expected methods for this type of study.
- The reviewers pointed out missing information about how the researchers created the automatic record linkages used in this project. The researchers explained that the development of those models was only to create appropriate record samples to test the human-computer interaction at the heart of this study. The researchers did note that full data were available through Github.
- The reviewers asked the researchers to clarify the difference between study partners and study participants throughout the report. In particular, the report results section mentioned that the researchers received expert feedback on the new record linkage software but there had been no mention of this work in the methods section of the report and no data presented on this feedback in the results. The researchers moved the expert feedback description to the engagement section as these experts were considered partners rather than participants.