Results Summary
What was the project about?
Randomized controlled trials, or RCTs, look at how well treatments work. But people who take part in RCTs may differ from patients who receive care in clinics. For instance, patients who take part in RCTs may be less likely to smoke or may have fewer health problems. These differences can affect how well a treatment works. As a result, a treatment may work differently for a patient receiving care in a clinic than it did for patients who took part in the RCT.
Researchers can use statistical methods to account for differences in patient traits and behaviors. In this project, the research team developed and tested new methods to account for these differences. They used the methods to apply RCT results to patients receiving care in clinics.
What did the research team do?
The research team combined data from an RCT with data from patients receiving care in clinics. The team looked at two ways of combining the data:
- Nested design. The RCT data were inserted, or nested, within data from patients receiving care in clinics.
- Non-nested design. The team added data from patients receiving care in clinics to the RCT data.
The research team then created three types of statistical methods to analyze the data from both study designs. The team used the methods to look at how well the treatment would work for the patients receiving care in clinics. The team tested the accuracy of the methods using data created by a computer and data collected from patients.
Patients and healthcare providers helped advise the study team.
What were the results?
All three types of methods worked correctly if enough data were included from the patients receiving care in clinics. If enough data weren’t available, then the findings from the RCT couldn’t be applied to clinic patients.
What were the limits of the project?
The methods need further development if data from patients receiving care in clinics are incomplete or have errors. To use the methods, researchers need to make sure that they have data on the patient traits that may affect how well a treatment works.
Future studies could explore ways to use the methods with other health problems and different data sets.
How can people use the results?
Researchers can use these methods to apply RCT results to patients receiving care in clinics.
Professional Abstract
Background
Patients seen in clinical practice may differ from those represented in randomized controlled trials (RCTs). For example, they may have different distributions of characteristics such as comorbidities. As a result, doctors cannot be sure that treatments tested in RCTs will work the same for patients seen in clinical practice. Statistical methods that generalize inferences from RCTs can help overcome such limitations.
Objective
To develop methods for extending RCT results to patients in clinical practice
Study Design
Design Element | Description |
---|---|
Design | Conceptual framework development; simulation studies; empirical analysis |
Data Sources and Data Sets |
Data from an RCT, observational data from clinical practice, simulation data |
Analytic Approach | Causal inference analysis in nested and non-nested study designs using different estimation methods based on outcome modeling, modeling probability of participation with inverse odds weighting, and a combination of outcome modeling and modeling the probability of participation |
Outcomes | Mean of treatment outcomes such as average treatment effect, probability of trial participation, probability of treatment among randomized individuals, performance of different estimation methods using mean bias and variance |
Methods
Researchers developed and evaluated methods for extending RCT results to patients seen in clinical practice.
Researchers combined baseline covariate, treatment, and outcome data from an RCT with baseline covariate data from a patient population. Based on the RCT’s eligibility and participation criteria, researchers combined the data using two designs:
- Nested trial designs in which the RCT data were embedded in a random sample of data from individuals from the patient population
- Non-nested trial designs in which the RCT data were added to a random sample of data from individuals from the patient population
Under each study design, researchers specified identifiability conditions for extending inferences from RCTs to patients seen in clinical practice. Identifiability conditions are required conditions for determining average potential treatment outcomes in patients who did not participate in the RCT. For example, one condition is that the treatment effect found in the RCT stems from the randomized treatment and was not due to other factors such as trial participation itself.
Researchers then developed three types of statistical methods, called estimators, for estimating treatment outcomes and average treatment effects based on outcome modeling, modeling the probability of participation in the trial, and a combination of outcome modeling and modeling the probability of participation.
To evaluate the estimators’ performance, researchers conducted studies using simulations and empirical data. Researchers used sensitivity analyses to assess how the methods worked when the identifiability conditions did not hold.
Patients and healthcare providers provided input to the study team.
Results
All estimators were unbiased when the model was correctly specified and included sufficient covariate data from the patient population. The estimators that were based on outcome modeling had the lowest variance followed by the estimators that combined the outcome model and the probability of participation model. If the estimator was biased, inferences from the RCT could not be extended to patient populations.
Limitations
The methods do not work when identifiability conditions are violated. In such cases, researchers can use sensitivity analyses to examine the impact of violations of the conditions. The methods do not address missing covariate data in the patient population, measurement error, or clustering. Applying the methods requires collecting baseline covariate data from patient populations of interest.
Conclusions and Relevance
Researchers can use these methods for extending causal inferences from RCTs to patients seen in clinical practice.
Future Research Needs
Future research could test the methods with other health outcomes and diverse data sources.
COVID-19-Related Study
Statistical Methods to Apply Predicted Health Outcomes from One Patient Group to Another
Results Summary
In response to the COVID-19 public health crisis in 2020, PCORI launched an initiative to enhance existing research projects so that they could offer findings related to COVID-19. The initiative funded this study and others.
What was this COVID-19 study about?
Predictive models are statistical models that can predict a patient’s risk for a specific event, such as severe illness or death. These models can help doctors tailor treatments. But for new illnesses, such as COVID-19, data to create such models are limited.
Some health facilities, like nursing homes, do have patient data on COVID-19. But their patients may differ from other patients in traits like age or other health problems, which could affect health outcomes for COVID-19. Current models can’t use data from one group of patients to accurately predict health outcomes in another group.
In this project, the research team developed and examined new methods for using data from one group of patients to predict health outcomes in another group. First, the team created predictive models to see whether patients with COVID-19 in one nursing home would recover. Then they developed methods to apply the prediction models from the nursing home to a second one that didn’t have much patient data. Last, the team assessed how well the methods worked to predict health outcomes for patients in the second nursing home.
What were the results?
The new methods helped improve the models for predicting whether patients are likely to recover from COVID-19.
What did the research team do?
The research team first used data created by a computer that looked like real patient data. They used the new methods to create predictive models from one data group for use with another data group. They checked the accuracy of the predictive models using new and existing methods.
The research team also used data from patients in two nursing homes. Using data from 10,648 patients in the larger nursing home, the team developed a model to predict the chances of recovery from COVID-19. Then they used this model to predict recovery for 1,184 patients in the smaller nursing home.
Of the patients in the larger nursing home, 75 percent were White, 14 percent were Black, and 5 percent were Latino. The average age was 77, and 61 percent were women. Of the patients in the smaller nursing home, 75 percent were White, 3 percent were Black, and 8 percent were Latino. The average age was 76, and 55 percent were women.
Patients and doctors helped design the study and interpret the results.
What were the limits of the study?
For the methods to work well, patient traits between the two groups, like age and sex, need to be similar.
Future research could study how to improve these methods so that they work well when patient traits are not similar across groups.
How can people use the results?
Researchers and doctors can use these methods to predict health outcomes for different groups of patients receiving care for COVID-19.
Professional Abstract
In response to the COVID-19 public health crisis in 2020, PCORI launched an initiative to enhance existing research projects so that they could offer findings related to COVID-19. The initiative funded this study and others.
Background
With new illnesses like COVID-19, patient data are often limited, making it difficult to develop accurate predictions for health outcomes. Further, available COVID-19 data may be from patients in research studies or from a limited number of healthcare facilities and therefore, may not represent all patients. New statistical methods could help extend COVID-19 prediction models from a source population to other populations with different characteristics.
Objective
To develop and evaluate methods for extending statistical models to predict COVID-19 outcomes, such as mortality, in different populations
Study Design
Design Element | Description |
---|---|
Design | Prediction model development; simulation studies; empirical analysis |
Population |
Minimum Data Set and electronic health record data from 10,648 and 1,184 patients from two nursing home facilities who developed COVID-19 |
Outcomes |
Mean squared error (Brier score) to measure model performance for predicting all-cause 30 day mortality in nursing home patients with COVID-19 |
Data Collection Timeframe |
March–September 2020 |
Researchers first developed a set of new methods to tailor prediction models from a source population to a target population of interest. These methods used covariate and outcomes data from the source population and covariate data from the target population of interest. Researchers also developed a statistical method called an inverse-odds weighting estimator to assess how the predictive models worked in the population of interest.
In simulation studies, researchers examined the prediction models produced by the new methods. They also compared the inverse-odds weighting estimator to a conventional unweighted alternative for assessing model performance.
Researchers then evaluated the methods using actual patient data from two nursing home facilities. Using weighted and unweighted logistic regression modeling, researchers developed prediction models for mortality using data from identified patients who had died from COVID-19. They tailored the prediction model from the larger facility, or source population, to the covariate distribution of the smaller facility, or target population of interest. Researchers evenly split the data sets from each facility into training and test data sets. They used the training data set to fit the model and used the estimator with the test data set to evaluate the prediction models’ performance.
The source population included data from 10,648 patients. Of these, 75% were White, 14% were Black, and 5% were Latino. The average age was 77, and 61% were female. The target population included data from 1,184 patients, of whom 75% were White, 3% were Black, and 8% were Latino. The average age was 76, and 55% were female.
Patients and clinicians helped design the study and interpret its results.
Results
Simulation studies. The inverse-odds weighting estimator accurately assessed model performance in the target population. Although a conventional unweighted estimator was biased, the inverse-odds weighting estimator was nearly unbiased.
Empirical study. Except for race, covariates between the two data sources were balanced. Estimated coefficients for prediction models using data from the source population were similar for weighted and unweighted logistic regression modeling approaches. Applying the models resulted in similar Brier scores in each population.
Limitations
The methods do not address issues such as measurement error or missing data in the data sources. Assumptions about the similarity between the source and target population must be met for valid results.
Conclusions and Relevance
Methods for extending statistical models can predict COVID-19-related outcomes for different patient populations.
Future Research Needs
Future research could further develop the methods to examine performance when assumptions are not met.
Peer Review Summary
The Peer-Review Summary for this COVID-19 study will be posted here soon.
Final Enhancement Report
This COVID-19 study's final enhancement report is expected to be available by June 2023.
Final Research Report
View this project's final research report.
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers questioned the applicability of the methods tested in this study. They noted that if a target population includes patients who would not have been eligible for the randomized trial, then the assumptions resulting from the randomized trial would not be generalizable to that target population. The researchers disagreed, stating that as long as the factors that made that target population ineligible were not factors that affected the trial assumptions, investigators could make inferences from the randomized trial to the target population.
- The reviewers noted that the report often refers to a target population in a way that readers might think that each study can only have one target population. The researchers agreed that this would be an incorrect assumption and revised the report to more clearly explain that the methods can be applied to any population, and that any one analysis would require the investigator to choose a specific group or population to test.
- The reviewers suggested that the researchers eventually provide an annotated version of the statistical codes from the appendix of this report on a public platform to make their methods more accessible. The researchers agreed and explained that they had already posted annotated statistical codes from this study on the GitHub.com platform.