Professional Abstract
Objective
To create standards for evaluating and reporting data quality in electronic health records by
- Developing data-user-driven recommendations for evaluating and reporting data quality
- Defining and assessing a common model for storing data-quality measures
- Developing data-quality reports and visuals tailored to data users
- Exploring technical, professional, and policy barriers to increasing data-quality transparency
Study Design
Design Element |
Description |
Design |
Empirical analysis |
Data Sources and Data Sets |
Qualitative data: transcripts from meetings with data users, including patients, patient advocates, healthcare policy makers, informatics professionals, statisticians, and clinical investigators
Quantitative data: 6 large health-system datasets representing 11,026 data-quality checks |
Analytic Approach |
Data collection: online webinars, face-to-face meetings, online surveys, and data-quality checks of health-system datasets
Data analysis: qualitative content analysis of meeting transcripts, iterative consensus development for terminology, data-quality-check analysis, descriptive statistics, analysis of variance, and exploratory factor analysis for survey results on barriers to reporting data-quality findings
|
Outcomes |
Primary: standardized recommendations for evaluating data quality
Secondary: evaluation of data-quality checks from data networks, prototypes for storing and reporting data-quality results, description of barriers to performing data-quality assessment, reporting findings
|
Patient-centered outcomes research relies on the increasing availability of operational patient-specific electronic data sources, including electronic health records. Because these data sources are typically developed for purposes other than research, challenges arise when attempting to analyze and report the data. Data-quality issues prevalent in electronic health records include missing, inaccurate, and inconsistent values.
The researchers used personal contacts with healthcare and research organizations to recruit 92 participants for two study groups. One group included patients, patient advocates, and healthcare policy makers. The second group included informatics professionals, statisticians, and clinical investigators. Study participants drafted data-quality terms, categories, and definitions during face-to-face workshops, monthly webinars, and 10 presentations at professional meetings.
The researchers also collected input from another 138 data-quality researchers on drafts of the data-quality terms, categories, and definitions through an online wiki. The researchers analyzed transcripts of the meetings using iterative thematic analysis to determine consensus-based data quality standards and reporting metrics.
The researchers recruited project leaders at six large health systems to perform data-quality checks on datasets using the data-quality standards generated by the study groups. The research team held workshops for study participants to generate data codes for a common data model and to explore effective ways to display data for data users.
The researchers also distributed an anonymous online survey to 141 data users to assess professional and personal barriers to data-quality reporting.
Results
Based on feedback from the participant groups, the research team wrote a set of 20 recommendations for data-quality-reporting standards.
The team identified three major categories of data quality: conformance, or agreement of values with technical specifications; completeness, or the extent to which data are present or absent; and plausibility, or the extent to which data are believable or correct based on the technical specifications. The study groups further divided the categories into two contexts: verification, or checking that data conform to internal constraints or expectations; and validation, or checking that data conform to external constraints or expectations.
Study participants from the large health systems validated these categories with agreement in 11,023 of 11,026 data-quality checks.
The researchers generated a prototype data-quality common model, which provides a way to store the data-quality summary statistics independent of the data source. The researchers developed new models for data visualization using data-quality summary statistics from the common model. The visualizations offer ways to quickly identify data-quality features of large datasets for use by both informatics specialists and clinical investigators.
Applying factor analysis to data from the online survey revealed three individual barriers to data-quality assessment and reporting: personal consequences, reporting-process problems, and lack of resources. The analysis also revealed two organizational barriers: environmental conditions and typical practices.
Limitations
Although the data-quality and transparency standards reflect community engagement and consensus from interested and knowledgeable participants, the generated standards may not represent the concerns of all data users. Other approaches for evaluating data quality, such as implementation of the Delphi method, may yield alternative standards.
The data owners and users in this study represented communities that use electronic health records and administrative records. These results may not be applicable to users of genomic, biologic, and social media data.
Conclusions and Relevance
Based on multiple rounds of feedback from patients, researchers, and policy makers, the study team created a set of standards to guide assessment and reporting of data quality for electronic health records. The study’s data-quality standards, data-storage model, and data-reporting visuals may help researchers conduct analyses and report results more consistently and transparently, facilitating improved interpretation and comparison of study results among data users.
Future Research Needs
Future research could solicit additional input on the data-quality standards from individuals from other relevant communities. Additional studies could involve users of genomic, biologic, and social media data in the development of additional data-quality standards.
Future research could also examine the implementation of the recommendations and measure the costs and impact of such implementation.