What was the research about?
Many healthcare systems use electronic health records. Researchers use data from these records in their studies. Some records have missing or incorrect data. When this happens, people might not be able to trust a study’s results. The research team wanted to:
- Create guidance to judge whether data that a study used were high quality
- Find new ways to display the quality of data
- Learn why researchers don’t always report the quality of data that they used in studies
What were the results?
The research team developed guidance to help people judge whether data in research studies were high quality. The guidance included ways to report quality. High-quality data are complete, believable, and reliable.
The research team found that the guidance helped researchers from six large healthcare systems judge and report data quality from electronic health records.
The study members created new ways to show data quality in pictures or graphs.
The research team found that cost, time, and lack of guidance were the primary reasons that researchers did not report on data quality.
Who was in the study?
About 100 people joined the study. The study members included healthcare workers, patient advocates, and policy makers. They also included project managers, people who work with healthcare data, and researchers. All the people in the study were interested in the quality of data that research studies use.
What did the research team do?
The research team held two in-person meetings and monthly online meetings for the study members. The team then used information from these meetings to write guidance about how to measure data quality. The team made sure that all study members agreed on the guidance.
The team used a website to ask for feedback from other people interested in data quality about the new guidance. The team tested the guidance using electronic health records from six large healthcare systems.
The research team asked study members for ideas about pictures and graphs that could show data quality. The research team also surveyed other researchers to find out what kept them from reporting on data quality.
What were the limits of the study?
The study included about 100 people interested in the quality of data used in research studies. Other people may have different ideas about looking at data quality.
People who took part in the study were interested in the use of electronic health records for research. The study didn’t include people who use other types of research data, such as data from science laboratories or from social media. People who use other types of data may have different ideas about reporting on data quality.
How can people use the results?
Having common guidance about measuring and reporting the quality of research data can help people understand whether data that studies used are high quality and trustworthy. Figuring out why researchers don’t report the quality of their data may lead to new ideas about how to better share the quality of data with everyone.
To create standards for evaluating and reporting data quality in electronic health records by
- Developing data-user-driven recommendations for evaluating and reporting data quality
- Defining and assessing a common model for storing data-quality measures
- Developing data-quality reports and visuals tailored to data users
- Exploring technical, professional, and policy barriers to increasing data-quality transparency
|Data Sources and Data Sets||
Qualitative data: transcripts from meetings with data users, including patients, patient advocates, healthcare policy makers, informatics professionals, statisticians, and clinical investigatorsQuantitative data: 6 large health-system datasets representing 11,026 data-quality checks
Data collection: online webinars, face-to-face meetings, online surveys, and data-quality checks of health-system datasets
Data analysis: qualitative content analysis of meeting transcripts, iterative consensus development for terminology, data-quality-check analysis, descriptive statistics, analysis of variance, and exploratory factor analysis for survey results on barriers to reporting data-quality findings
Primary: standardized recommendations for evaluating data quality
Secondary: evaluation of data-quality checks from data networks, prototypes for storing and reporting data-quality results, description of barriers to performing data-quality assessment, reporting findings
Patient-centered outcomes research relies on the increasing availability of operational patient-specific electronic data sources, including electronic health records. Because these data sources are typically developed for purposes other than research, challenges arise when attempting to analyze and report the data. Data-quality issues prevalent in electronic health records include missing, inaccurate, and inconsistent values.
The researchers used personal contacts with healthcare and research organizations to recruit 92 participants for two study groups. One group included patients, patient advocates, and healthcare policy makers. The second group included informatics professionals, statisticians, and clinical investigators. Study participants drafted data-quality terms, categories, and definitions during face-to-face workshops, monthly webinars, and 10 presentations at professional meetings.
The researchers also collected input from another 138 data-quality researchers on drafts of the data-quality terms, categories, and definitions through an online wiki. The researchers analyzed transcripts of the meetings using iterative thematic analysis to determine consensus-based data quality standards and reporting metrics.
The researchers recruited project leaders at six large health systems to perform data-quality checks on datasets using the data-quality standards generated by the study groups. The research team held workshops for study participants to generate data codes for a common data model and to explore effective ways to display data for data users.
The researchers also distributed an anonymous online survey to 141 data users to assess professional and personal barriers to data-quality reporting.
Based on feedback from the participant groups, the research team wrote a set of 20 recommendations for data-quality-reporting standards.
The team identified three major categories of data quality: conformance, or agreement of values with technical specifications; completeness, or the extent to which data are present or absent; and plausibility, or the extent to which data are believable or correct based on the technical specifications. The study groups further divided the categories into two contexts: verification, or checking that data conform to internal constraints or expectations; and validation, or checking that data conform to external constraints or expectations.
Study participants from the large health systems validated these categories with agreement in 11,023 of 11,026 data-quality checks.
The researchers generated a prototype data-quality common model, which provides a way to store the data-quality summary statistics independent of the data source. The researchers developed new models for data visualization using data-quality summary statistics from the common model. The visualizations offer ways to quickly identify data-quality features of large datasets for use by both informatics specialists and clinical investigators.
Applying factor analysis to data from the online survey revealed three individual barriers to data-quality assessment and reporting: personal consequences, reporting-process problems, and lack of resources. The analysis also revealed two organizational barriers: environmental conditions and typical practices.
Although the data-quality and transparency standards reflect community engagement and consensus from interested and knowledgeable participants, the generated standards may not represent the concerns of all data users. Other approaches for evaluating data quality, such as implementation of the Delphi method, may yield alternative standards.
The data owners and users in this study represented communities that use electronic health records and administrative records. These results may not be applicable to users of genomic, biologic, and social media data.
Conclusions and Relevance
Based on multiple rounds of feedback from patients, researchers, and policy makers, the study team created a set of standards to guide assessment and reporting of data quality for electronic health records. The study’s data-quality standards, data-storage model, and data-reporting visuals may help researchers conduct analyses and report results more consistently and transparently, facilitating improved interpretation and comparison of study results among data users.
Future Research Needs
Future research could solicit additional input on the data-quality standards from individuals from other relevant communities. Additional studies could involve users of genomic, biologic, and social media data in the development of additional data-quality standards.
Future research could also examine the implementation of the recommendations and measure the costs and impact of such implementation.
Final Research Report
View this project's final research report.
Related Journal Citations
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Reviewers’ comments and the investigator’s changes in response included the following:
- The awardee provided more information about the Data Quality Collaborative (DQC) and its work in identifying key data quality recommendations.
- Based on reviewer recommendations, the awardee highlighted key study results involving harmonized data quality terms and recommendations by reorganizing the report by the three distinct categories of study findings.
- The reviewers requested that the investigator clarify the description of the factor analyses completed on the survey data, including replacing a more-technical table with a more-intuitive figure.
- The awardee added a discussion of the relevance and potential impact of this study on patient-centered outcomes research. The results improved the ability to assess data quality of a specific data set.
Conflict of Interest Disclosures
Study Registration Information
Final Research Report
View this project's final research report.
- Has Results