Results Summary
What was the project about?
Researchers often combine data from different sources, such as insurance claims and health records, to get a better picture of patients’ health and use of health care. Researchers use unique identifiers, like Social Security numbers, to connect patient records and make them more complete. But sometimes this approach doesn’t work well, especially when records don’t have much personal information. Having limited personal data can lead to errors when linking records.
In this study, the research team created new methods to link data sets with limited personal information. Then they compared the new methods with existing ones. They also applied the new methods with real patient data.
What did the research team do?
The research team created two new methods to link records from two sources:
- BRLVOF, which uses extra patient information from one source along with identifiers in both sources to link records
- MLBRL, which matches information about patients and groups of patients, like patients who go to the same doctor
Then the research team created test data to see how the new methods perform in different situations. For example, they created mistakes and then changed the number of mistakes in the personal information. They compared the new methods to current methods, which use only linking identifiers. The team looked at how well each method linked patient records in each situation.
Using real patient data, the research team used the new methods to link a national injury list to Medicare data for patients who had a brain injury. They linked the two data sets to see if certain patient traits before the injury were related to patient recovery at a medical center.
Doctors gave input during the study.
What were the results?
With the test data, BRLVOF did better than the current methods to link patient records when patient data were missing or wrong. When linking patient records within patient groups getting care from the same doctor, MLBRL did better than the current methods.
When the research team used the new methods to link real patient data, the results showed that patient traits before a brain injury were not related to patient health outcomes at care centers.
What were the limits of the project?
The research team only tested the new methods using data from patients who have Medicare. Results may differ for patients with other types of insurance.
Future research can check how well the new methods work for linking different data sources and data for patients with other health issues.
How can people use the results?
Researchers can use the new methods to link patient data from data sets that have limited personal information.
Professional Abstract
Background
To get a more complete picture of patients’ health and healthcare utilization, researchers use record linkage methods to combine patient health data from different sources, such as health insurance claims and health records. When unique patient identifiers, like Social Security numbers, are not available, researchers may use semi-identifiable information, like date of birth and ZIP code, to link records. However, using semi-identifiable information may not link records accurately, particularly when data are inaccurate or incomplete. Current analysis methods for linked data sets do not account for these issues, which can lead to biased or inaccurate results.
In this study, the research team developed and tested new record linkage methods for linking two data sets under different conditions when limited identifiers are present.
Objective
To develop and test new methods to link two data sets and account for limited patient identifiers
Study Design
Design Element | Description |
---|---|
Design | Simulation studies; empirical analysis |
Data Sources and Data Sets |
|
Analytic Approach |
|
Outcomes |
Simulation study: model performance metrics for true positive rate, positive predictive value, F1 score, and average coverage criterion Empirical analysis: discharge from inpatient rehabilitation facility with no readmission or death for 30 days following discharge |
Methods
To improve linkage accuracy when few identifiers are available, the research team developed two new Bayesian record linkage algorithms:
- Bayesian Record Linkage with Variable in One File (BRLVOF), an algorithm to link two data sets that considers non-linking variables in one data set and uses relationships between non-linking variables from each data set
- Multilayer Bayesian Record Linkage (MLBRL), an algorithm that links data sources by simultaneously accounting for patient identifiers and grouping entities in the data set, such as the provider for a group of patients
The research team conducted simulation analyses to compare the new methods with existing record linkage methods under different scenarios, such as varying error levels for linking variables and model misspecification.
The research team then applied the methods to link the National Trauma Data Bank (NTDB) data set to Medicare claims data for patients who went to inpatient care facilities after a traumatic brain injury. The team examined the linked data set to identify factors associated with recovery outcomes.
Clinicians provided input during the study.
Results
Compared with models using existing Bayesian record linkage methods, the BRLVOF models performed better in nearly all scenarios. Adjusting for relationships between variables exclusive to each file resulted in more accurate links.
Compared with existing methods that do not link by group and individual records simultaneously, the MLBRL models performed better than the models created from existing methods.
The research team did not find significant relationships between pre-injury characteristics and patients’ health outcomes at inpatient rehabilitation facilities.
Limitations
The research team applied the new methods to Medicare data. Results may differ when linking data from other patients and for different health outcomes.
Conclusions and Relevance
The new methods were better at linking records than existing methods. Researchers can use these methods to link two data sources with limited patient identifiers.
Future Research Needs
Future research could test how well the new linkage methods work for linking different patient data sets and for use with other health outcomes.
Final Research Report
This project's final research report is expected to be available by July 2024.
Journal Citations
Related Journal Citations
Peer-Review Summary
The Peer-Review Summary for this project will be posted here soon.