There is growing interest in using electronic health record (EHR) data as a practical resource to support patient-centered research using patient outcomes from real-world clinical settings. An enormous number of articles reporting health outcomes in the EHR are appearing in the medical literature, alongside others that raise concerns of data quality and misleading findings. For example, the EHR may have incorrect information for medical diagnoses, dates of diagnoses, or treatments. Errors of this nature in the EHR can be corrected by carefully reviewing patient records and making changes, where needed, but it is expensive and time-consuming to do this for large numbers of patient records. Instead, data validation can be performed on a subset of records and this information can be used to statistically correct estimates based on the larger dataset, most of which has not been validated. The primary aims of this project are 1) to develop statistical methods that allow researchers to obtain accurate estimates using data that have only been partially validated, 2) to better understand which patient records should be validated to optimize resources, and 3) to apply the methods to a real-world study using EHR data. Existing statistical methods are only able to handle simple errors, whereas errors in EHR data tend to be more complicated and across several different variables (e.g., date of treatment initiation may be incorrectly recorded, so blood pressure at treatment initiation may also be incorrect). This project will extend existing statistical methods to handle errors commonly seen in EHR data. The study will address questions such as: What records are most informative for correcting the analysis? If an initial subset of patient records is validated, what is the best way to use this information to select a second subset of records to validate? Finally, the project team will apply what it learns to an ongoing research study from the Mid-South Clinical Data Research Network, which includes EHR data from millions of patients in the southeast United States. In this study, the team will identify factors that affect risk of early childhood obesity, such as a mother’s weight over time, and adjust the analysis for error patterns that can affect these risk factors. The project team will publish its results in scientific papers and develop publicly available software that implements its methods. Accurate study results are important for medical researchers, clinical providers, and patients, so that medical practice can be based on reliable, trustworthy information. However, research funds are limited, so complete validation of the EHR prior to using it for medical research is impractical. The proposed statistical methods will result in more trustworthy results while saving researchers and their funders money. The project team will meet regularly with advisory committees of stakeholders (PCORnet leaders, funders, investigators, and patients) to ensure that the study’s methods are grounded in reality and of value to patient care.