What was the project about?
For research studies, researchers can use data about patients’ health and treatments from electronic health records, or EHRs. They may also collect self-reported data directly from patients. But a patient’s EHR and self-reported data may not always agree. For example, differences may exist between the medicines that patients report taking and the medicines listed in their EHRs. Researchers don’t know which of these two data sources is the most accurate.
In this project, the research team looked at EHR and self-reported data to learn which data source was more accurate.
What did the research team do?
The research team used EHR and self-reported data from 5,900 adult patients who received care from clinics and hospitals in North Carolina and Arkansas between December 2018 and December 2019.
First, the research team compared each patient’s EHR data with their self-reported data to check agreement between the two. The team compared data for:
- 34 health problems, such as obesity
- Eight types of surgery or procedures, such as heart surgery
- Hospital stays
- Smoking status
Next, the research team interviewed 610 patients whose EHR and self-reported data were not the same. The team found out which information was correct. Based on these findings, the team created a new data set for the 610 patients with correct data. To measure the accuracy of the EHR and self-reported data sources, the team compared data from each source with the new data set.
Patients and doctors helped design and conduct the study.
What were the results?
Of the 5,900 patients, 5.5 percent had complete agreement between their EHR and self-reported data. For all other patients, agreement between EHR and self-reported data varied. For obesity data, 24 percent of patients had agreement between the EHR and self-reported data sources. For heart surgery data, 99 percent of patients had agreement. For 75 percent of patients who had hospital stays, the data sources had different admission dates.
Both EHR and self-reported data sources had missing data on health problems. For the 610 interviewed patients, self-reported data were more accurate than EHR data. The accuracy of the self-reported data source was equal to or greater than 90 percent for most of the data; the accuracy of the EHR data was less than 80 percent. Both data sources had less than 90 percent accuracy for obesity, high cholesterol, and hospital stays.
What were the limits of the project?
The research team measured the accuracy of data for specific health problems. Findings were different based on where patients lived. The results may not apply to other health conditions or patients from other states.
Future research could examine differences in data sources in other states to learn more about which source is more accurate to use.
How can people use the results?
Researchers can use the results when considering which data sources to use in studies.
Patient-centered outcomes research can use electronic health record (EHR) data or self-reported data collected from patients. However, discrepancies between EHR data and self-reported data are common, and it is difficult for researchers to know which data source is the most accurate. Previous research has not rigorously examined the reasons for discrepancies between EHR and self-reported data. Improved approaches for assessing data accuracy may guide researchers in choosing the appropriate data source for a study.
To assess agreement between EHR data and self-reported data and measure the accuracy of each data source
|Design||Empirical analysis, descriptive cross-sectional study|
|Data Sources and
|EHR and self-reported data from 5,900 adult patients, ages 18 and older, who received treatment between December 2018 and December 2019 from clinics and hospitals in North Carolina and Arkansas|
Agreement and accuracy
The research team enrolled and obtained EHR records and self-reported data for 5,900 adult patients receiving care at primary care and specialty clinics and hospitals in North Carolina and Arkansas between December 2018 and December 2019. In North Carolina, self-reported data came from the MURDOCK Community registry. In Arkansas, the team collected self-reported data directly from patients.
To measure agreement between the EHR data and self-reported data, the research team linked the data sources and conducted statistical analysis to compare data on 45 items, including 34 medical conditions, 8 procedures, hospitalizations, and smoking status.
To assess the accuracy of each data source, the research team selected and interviewed 610 patients who had discrepancies between their EHR and self-reported data to identify the cause of each discrepancy. Except for age and ethnicity, interviewed patients were representative of the larger sample of study participants. The team created a reference data set for these patients by replacing data with confirmed data from the interviews. The team then compared each item in the EHR and self-reported data with the reference data set to calculate percent agreement, sensitivity, and specificity.
Patients and doctors gave input on the design and conduct of the study.
Agreement. Of the 5,900 participants, 5.5% had complete agreement between their EHR and self-reported data. Overall agreement between the two data sources ranged from 24% for obesity to 99.2% for coronary artery bypass surgery.
Sensitivity and specificity. Both EHR and self-reported data sources had missing data for some data items. Of the 610 interviewed patients, self-reported data had greater sensitivity than EHR data. For self-reported data, sensitivity was equal to or greater than 90% for 18 of the 45 data items. For EHR data, sensitivity was less than 80% for 30 items. Both data sources had less than 90% sensitivity for obesity, high cholesterol, and hospitalizations. Specificity was similar in both data sources.
The study examined data accuracy for specific health conditions and procedures. Findings varied based on location. Results may not generalize to other conditions or locations.
Conclusions and Relevance
In this study, self-reported data sensitivity was greater than EHR data sensitivity. Including self-reported data as a data source may improve the accuracy of patient-centered outcomes research.
Future Research Needs
Future research could examine regional differences between data sources to help determine which source is more accurate.
Final Research Report
This project's final research report is expected to be available by November 2023.
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers asked the researchers to further address the generalizability of the study. The researchers added to the report discussion regarding which aspects of the results would be generalizable and which aspects would not.
- The reviewers also asked for additional details on the gold standard identified by the researchers, noting that since the gold standard appeared to include patient interview it would most likely favor comparison to the patient report method of data collection over electronic health record data and that errors were found in the patient interviews as well. The researchers responded that the gold standard could only be improved with full electronic health records or availability of other data sources. They wrote that in reality the gold standard is the truth, but the accuracy measure used—patient interview—was flawed.
- The reviewers requested that a more organized description of the differences between the two study sites be included in the report. The researchers added two tables to the report, one listing aspects of the study protocol that were altered for the second site and one table at the end of the results section listing the differences they found between the sites during the course of the study.
Study Registration Information
^Meredith Nahm Zozus, PhD, was the original principal investigator on this project. Dr. Zozus was affiliated with Duke University when this project was initially awarded.
- Has Results