Project Summary

Many important pediatric research questions involve evaluating growth, most often height and weight. For example, children with medical conditions such as celiac disease and congenital heart disease may have slow growth, and high-quality research can help clinicians identify and treat them. Childhood obesity is common, and knowing how best to prevent and treat obesity can improve health and quality of life while allowing children, families and healthcare providers to avoid ineffective interventions. 

Many studies of growth use data from electronic health records (EHRs) that are obtained during clinical care. Using EHR data has both advantages and disadvantages compared to using data collected specifically for research. One advantage is that EHR data are often available for many children, which makes it easier to study children with rare conditions. The only practical way to perform some studies is with EHR data. 

An important disadvantage is that height and weight measurements in EHR data generally have more error than measurements that are collected specifically for research. This error can make study results less accurate (i.e., cause bias). In the proposed work the study team aims to help researchers reduce bias caused by error in height and weight data from EHRs. 

Methods already exist to reduce the impact of error by cleaning data, which means removing clearly erroneous measurements from the data before analyzing them. Unfortunately, not all errors can be cleaned. For example, a child might wear thick-soled shoes when their height is measured. This can cause an error too small to be caught by cleaning but big enough to affect the child’s recorded height and calculated body mass index. 

It is known that measurement error can cause different kinds of bias in different studies. It is not known how the specific types of errors in growth measurements left behind after cleaning affect the results of common types of growth studies. There is also no tool that measures these types of errors in growth data from EHRs. This project will fill those knowledge gaps to help people plan better research about children’s growth and help them understand research results. 

First, the project team will identify study topics of interest to a stakeholder advisory board that includes patients, caregivers, medical assistants, researchers, clinicians, EHR data experts and statisticians. They will imitate studies on those topics using data from two large EHR data networks that do not include names or similar patient identifiers. Then, the study team will create versions of the same data with different types of error added and imitate the studies again. For example, they may add random error that mimics children getting measured with and without shoes on. 

Comparing study results for analyses with and without error will allow the researchers to understand how much and which types of bias are caused by different errors in different types of studies. Next, the study team will develop indicators of specific types of error using data from the research networks. Some indicators will be based on data-cleaning software they previously developed. The study team will test the indicators to make sure they work well. Then they will select the most important indicators based on several factors, including which quality issues produced the most bias and which quality issues are most important to stakeholders. 

Finally, the study team will develop free software that uses these indicators to provide a detailed assessment of the errors in pediatric height and weight data in datasets from EHRs. The software will link the results about the errors in a dataset to the evidence to be gathered about how those errors affect research. Stakeholders will help to design how the results are presented. 

This work can help improve health care for children in several ways. First, researchers can use information about data quality to choose datasets with less error and to plan an analysis that is less likely to be affected by the types of error found in a specific dataset. Second, the tool can help researchers report the quality of data in their studies, which gives patients, caregivers, clinicians and others more information to evaluate the trustworthiness of research results before incorporating them into decisions. Finally, data quality results can be given to practices and health systems to help them improve the quality of future measurements.

Project Information

Carrie Daymont, M.D., M.S.
Pennsylvania State University Hershey Medical Center
$1,117,251 *

Key Dates

36 months *
April 2024

*All proposed projects, including requested budgets and project periods, are approved subject to a programmatic and budget review by PCORI staff and the negotiation of a formal award contract.


Project Status
Award Type
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: April 23, 2024