Comparing Accuracy of Health Record Data and Self-Reported Data

Results Summary
Professional Abstract

Results Summary

Download Summary

What was the project about?

For research studies, researchers can use data about patients’ health and treatments from electronic health records, or EHRs. They may also collect self-reported data directly from patients. But a patient’s EHR and self-reported data may not always agree. For example, differences may exist between the medicines that patients report taking and the medicines listed in their EHRs. Researchers don’t know which of these two data sources is the most accurate.

In this project, the research team looked at EHR and self-reported data to learn which data source was more accurate.

What did the research team do?

The research team used EHR and self-reported data from 5,900 adult patients who received care from clinics and hospitals in North Carolina and Arkansas between December 2018 and December 2019.

First, the research team compared each patient’s EHR data with their self-reported data to check agreement between the two. The team compared data for:

34 health problems, such as obesity
Eight types of surgery or procedures, such as heart surgery
Hospital stays
Smoking status

Next, the research team interviewed 610 patients whose EHR and self-reported data were not the same. The team found out which information was correct. Based on these findings, the team created a new data set for the 610 patients with correct data. To measure the accuracy of the EHR and self-reported data sources, the team compared data from each source with the new data set.

Patients and doctors helped design and conduct the study.

What were the results?

Of the 5,900 patients, 5.5 percent had complete agreement between their EHR and self-reported data. For all other patients, agreement between EHR and self-reported data varied. For obesity data, 24 percent of patients had agreement between the EHR and self-reported data sources. For heart surgery data, 99 percent of patients had agreement. For 75 percent of patients who had hospital stays, the data sources had different admission dates.

Both EHR and self-reported data sources had missing data on health problems. For the 610 interviewed patients, self-reported data were more accurate than EHR data. The accuracy of the self-reported data source was equal to or greater than 90 percent for most of the data; the accuracy of the EHR data was less than 80 percent. Both data sources had less than 90 percent accuracy for obesity, high cholesterol, and hospital stays.

What were the limits of the project?

The research team measured the accuracy of data for specific health problems. Findings were different based on where patients lived. The results may not apply to other health conditions or patients from other states.

Future research could examine differences in data sources in other states to learn more about which source is more accurate to use.

How can people use the results?

Researchers can use the results when considering which data sources to use in studies.

Professional Abstract

Background

Patient-centered outcomes research can use electronic health record (EHR) data or self-reported data collected from patients. However, discrepancies between EHR data and self-reported data are common, and it is difficult for researchers to know which data source is the most accurate. Previous research has not rigorously examined the reasons for discrepancies between EHR and self-reported data. Improved approaches for assessing data accuracy may guide researchers in choosing the appropriate data source for a study.

Objective

To assess agreement between EHR data and self-reported data and measure the accuracy of each data source

Study Design

Design Element	Description
Design	Empirical analysis, descriptive cross-sectional study
Data Sources and Data Sets	EHR and self-reported data from 5,900 adult patients, ages 18 and older, who received treatment between December 2018 and December 2019 from clinics and hospitals in North Carolina and Arkansas
Analytic Approach	Linkage and comparison of EHR and self-reported data on 45 data items, including 34 medical conditions and 8 procedures, hospitalizations, and smoking status (current smoking, ever smoked) Qualitative analysis of interview data from 610 patients with discrepancies between EHR and self-reported data Development of reference data set with verified information from interviews Measurement of data source accuracy by comparing EHR data and SR data individually to the reference data set
Outcomes	Agreement and accuracy

Methods

The research team enrolled and obtained EHR records and self-reported data for 5,900 adult patients receiving care at primary care and specialty clinics and hospitals in North Carolina and Arkansas between December 2018 and December 2019. In North Carolina, self-reported data came from the MURDOCK Community registry. In Arkansas, the team collected self-reported data directly from patients.

To measure agreement between the EHR data and self-reported data, the research team linked the data sources and conducted statistical analysis to compare data on 45 items, including 34 medical conditions, 8 procedures, hospitalizations, and smoking status.

To assess the accuracy of each data source, the research team selected and interviewed 610 patients who had discrepancies between their EHR and self-reported data to identify the cause of each discrepancy. Except for age and ethnicity, interviewed patients were representative of the larger sample of study participants. The team created a reference data set for these patients by replacing data with confirmed data from the interviews. The team then compared each item in the EHR and self-reported data with the reference data set to calculate percent agreement, sensitivity, and specificity.

Patients and doctors gave input on the design and conduct of the study.

Results

Agreement. Of the 5,900 participants, 5.5% had complete agreement between their EHR and self-reported data. Overall agreement between the two data sources ranged from 24% for obesity to 99.2% for coronary artery bypass surgery.

Sensitivity and specificity. Both EHR and self-reported data sources had missing data for some data items. Of the 610 interviewed patients, self-reported data had greater sensitivity than EHR data. For self-reported data, sensitivity was equal to or greater than 90% for 18 of the 45 data items. For EHR data, sensitivity was less than 80% for 30 items. Both data sources had less than 90% sensitivity for obesity, high cholesterol, and hospitalizations. Specificity was similar in both data sources.

Limitations

The study examined data accuracy for specific health conditions and procedures. Findings varied based on location. Results may not generalize to other conditions or locations.

Conclusions and Relevance

In this study, self-reported data sensitivity was greater than EHR data sensitivity. Including self-reported data as a data source may improve the accuracy of patient-centered outcomes research.

Future Research Needs

Future research could examine regional differences between data sources to help determine which source is more accurate.

Final Research Report

View this project's final research report.

Peer-Review Summary

Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.

The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.

Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:

The reviewers asked the researchers to further address the generalizability of the study. The researchers added to the report discussion regarding which aspects of the results would be generalizable and which aspects would not.
The reviewers also asked for additional details on the gold standard identified by the researchers, noting that since the gold standard appeared to include patient interview it would most likely favor comparison to the patient report method of data collection over electronic health record data and that errors were found in the patient interviews as well. The researchers responded that the gold standard could only be improved with full electronic health records or availability of other data sources. They wrote that in reality the gold standard is the truth, but the accuracy measure used—patient interview—was flawed.
The reviewers requested that a more organized description of the differences between the two study sites be included in the report. The researchers added two tables to the report, one listing aspects of the study protocol that were altered for the second site and one table at the end of the results section listing the differences they found between the sites during the course of the study.

Conflict of Interest Disclosures

View the COI Disclosure Form

Project Information

Principal Investigator Principal Investigator The lead researcher and primary contact for the project. View Glossary:

Anita Walden, MS^

Organization Organization The institution/organization in which the project originates, or the primary institution or organization that received funding for the project. View Glossary:

Oregon Health & Science University

Project Budget:

$80,913

DOI - Digital Object Identifier:

10.25302/03.2023.ME.140922573

Project Title Project Title The original title of the project supplied by the principal investigator or project lead/team. View Glossary:

Measuring and Talking to Patients about the Accuracy of Data used in PCOR

Key Dates

Approval Date Approval Date The date of approval to fund by PCORI. The actual project start dates vary as the negotiation of project milestones must be completed before the contract can be fully executed. View Glossary:

April 2015

Project End Date Project End Date Includes the research project period and may be subject to modification to allow other research-related activities such as peer review. View Glossary:

January 2023

Year Awarded Year Awarded The year that funding for the project was approved, or the year the proposal received a notice of award. View Glossary:

2015

Year Completed:

2023

Study Registration Information

HSR Project Number:

HSRP20153596

^Meredith Nahm Zozus, PhD, was the original principal investigator on this project. Dr. Zozus was affiliated with Duke University when this project was initially awarded.

About

Research

Impact

Highlights of PCORI-Funded Research Results

Topics

Engagement

Funding Opportunities

Applicant and Awardee Resources

Events

Jump to Section