Results Summary
What was the project about?
Clinical prediction models, or CPMs, are statistical models that can predict a patient’s risk for a specific event, such as a health problem, adverse effect, or even death. To create a CPM, researchers use a single data set, such as data from a clinical trial. To find out whether the CPM accurately predicts risks for patients who weren’t part of the original data, researchers can test the CPM with other data sets. This testing can help researchers know if the CPM is accurate for patients from diverse backgrounds and whether it can be used to make health decisions. But few CPMs have been tested with other data sets.
In this study, the research team used other data sets to look at how well CPMs for heart disease predict patients’ risks. They also looked at how to improve CPMs.
What did the research team do?
First, the research team reviewed existing studies that tested CPMs with other data sets. They found that 58 percent of CPMs had never been tested with other data sets. Also, CPMs varied in how well they predicted patients’ risks when tested with other data sets.
Next, the research team identified 36 other data sets to test 108 heart disease CPMs. To see whether the CPMs accurately predicted patients’ risks, the team tested each CPM with these other data sets. They also looked at whether decisions about health care based on the CPMs would do more harm than good.
Finally, the research team tested three statistical methods to improve CPMs so that they predict risks more accurately.
What were the results?
When tested using other data sets, the selected heart disease CPMs often didn’t accurately predict patients’ risks. For over 80 percent of the CPMs, health decisions based on the CPM would have done more harm than good.
The three statistical methods helped improve the accuracy of the selected CPMs so that fewer CPMs led to decisions that would have done more harm than good.
Working with researchers and doctors, the research team created a website to share information about heart disease CPMs. The website shows how many times a CPM has been tested with other data sets and how well it predicted patients’ risks.
What were the limits of the project?
The data sets used to create the CPMs, and the data sets used to test the CPMs, didn’t always have consistent information. As a result, the research team could only test 108 CPMs. Findings may have differed if the team had tested other CPMs.
Future research could look at why CPM results vary across different patient data sets.
How can people use the results?
Researchers can use the results to test and improve CPMs. Doctors can use the website to learn about heart disease CPMs.
Professional Abstract
Background
Clinical prediction models (CPMs) estimate a patient’s risk for a particular outcome, such as onset of disease or death, and can inform clinical decisions so that they are consistent with a patient’s risks, values, and preferences. CPMs rely on existing patient data, such as clinical and laboratory information. However, most CPMs have not been validated in external cohorts using independent data sets that were not used to create the CPMs. Using data from external cohorts can help determine whether the CPMs accurately predict risks for diverse patient populations to support clinical decision making.
Objective
(1) To understand how well cardiovascular CPMs perform using data from external cohorts; (2) To test the effectiveness of updating procedures to improve risk predictions from CPMs
Study Design
Design Element | Description |
---|---|
Design | Systematic review; empirical analysis |
Data Sources and Data Sets |
|
Analytic Approach |
|
Outcomes | Discrimination, calibration error, net benefit |
Methods
The research team first conducted a systematic review of 1,382 CPMs to assess how frequently CPMs are validated, and performance variation across external cohorts. Then the team used 36 clinical data sets to validate 108 CPMs for three health conditions: acute coronary syndrome, heart failure, and incident cardiovascular disease. To test CPM performance, the team used three metrics:
- Discrimination, or how well a CPM separates patients with the outcome from those without the outcome.
- Calibration error, or the difference between the observed outcome rate and the estimated probabilities.
- Net benefit, or a measure of whether basing clinical decisions on the CPM would do more good than harm at different risk thresholds. The risk thresholds were the observed rate of an outcome, twice the observed rate, and half the observed rate.
The research team tested three statistical updating procedures to improve CPM predictions.
With input from a stakeholder panel that included researchers, clinicians, industry experts, and patient advocates, the research team created a website with a comprehensive database of each CPM’s performance.
Results
Systematic review. Of the 1,382 CPMs, 58% had never been validated in external cohorts. Validated CPMs varied widely in discrimination across different cohorts. Discrimination was worse when the external cohort was clinically different from the CPM population.
Validations. The 108 CPMs had worse discrimination when using data from external cohorts than in the cohort used to create the CPMs. The typical standardized calibration error was 0.5, or half the average risk. Over 80% of models showed a potential for harm in at least one of the three thresholds examined.
Testing updating procedures. The three updating procedures reduced the number of CPMs yielding a negative net benefit.
Limitations
Because many CPMs were not compatible with the data sets that were available for validation, the study used a convenience sample of 108 CPMs. Findings may not generalize to all cardiovascular CPMs.
Conclusions and Relevance
When CPMs are used without updating, there appears to be a high risk of harm. Updating CPMs through validation in external cohorts can help improve risk predictions and clinical decision making when using CPMs.
Future Research Needs
Future research could investigate the reasons for the wide variation in CPM performance in external cohorts.
COVID-19-Related Study
Developing and Testing Models for COVID-19 Health Outcomes
Results Summary
In response to the COVID-19 public health crisis in 2020, PCORI launched an initiative to enhance existing research projects so that they could offer findings related to COVID-19. The initiative funded this study and others.
What was this COVID-19 study about?
Statistical models can predict a patient’s risk for a health problem, a health outcome, or even death. Researchers may create models using patient data from a single health system. But patients in other places may have different traits. Also, treatment might change over time.
In this study, the research team created two sets of models to predict health outcomes related to COVID-19 to help patients and doctors make treatment decisions:
- Northwell COVID-19 Survival, or NOCOS. This set of models used data from patients admitted to hospitals with COVID-19 in New York City from March to August 2020.
- COVID Outcome Prediction in the Emergency Department, or COPE. This set of models used data from patients admitted to hospitals with COVID-19 in the Netherlands from March to August 2020.
Then the research team tested the models to see how well they could predict patient risks when used with patients in other places or time periods.
What were the results?
Both NOCOS and COPE performed well at identifying patients at high risk for death. On average, COPE predicted a higher risk of death than what really occurred.
Both models performed well at identifying patients who needed to be on a ventilator or in intensive care. On average, NOCOS predicted a higher risk for needing intensive care than what occurred.
Neither model performed well in predicting length of stay in intensive care or death after being on a ventilator.
Who was in the study?
The study included data from 23,383 patients receiving care for COVID-19 at hospitals in the New York City area and the Netherlands. The median patient age ranged from 65 to 71; 57 percent of patients were men.
What did the research team do?
To test whether the models could predict risks for patients with COVID-19 at different time periods, the research team used data from September to December 2020. To test whether the models could predict risks for patients outside the health system location where each model was developed, the team used data from the other location.
Clinicians, patients, and their caregivers gave input about their concerns and the usefulness of the models.
What were the limits of the study?
COVID-19, the public health response to COVID-19, and vaccine availability continue to change. Models may not perform as well in the future.
How can people use the results?
Doctors and patients could use these models when considering health risks and treatments for patients with COVID-19.
Professional Abstract
In response to the COVID-19 public health crisis in 2020, PCORI launched an initiative to enhance existing research projects so that they could offer findings related to COVID-19. The initiative funded this study and others.
Background
COVID-19 patients who are hospitalized face uncertain clinical outcomes, including symptom severity, disease trajectory, and mortality risk. Clinical prediction models (CPMs) that accurately predict disease progression can support treatment decision making, such as whether to undergo invasive mechanical ventilation. CPMs typically rely on existing data from a single healthcare system. However, patient characteristics vary across locations. In addition, COVID-19 and its treatment may change over time. Knowing if CPMs work with data from different locations or time periods can help determine whether CPMs are useful to support decision making.
Objective
(1) To develop CPMs to predict mortality and disease progression for patients hospitalized for COVID-19; (2) To evaluate how well the CPMs predict COVID-19 outcomes using data from different locations and time periods
Study Design
Design Element | Description |
---|---|
Design | Model development (logistic regression, LASSO regression), model evaluation |
Population | Data from 23,383 patients who presented at the emergency department and were admitted to the hospital for COVID-19 at 13 Northwell Health clinics in the New York City area and at 4 hospitals in the Netherlands |
Outcomes |
|
Timeframe | March-August 2020 (first wave); September–December 2020 (second wave) |
Methods
Researchers used existing data from 23,383 patients who received care for COVID-19 at 13 clinics in the New York City area and 4 Dutch hospitals. The median age ranged from 65 to 71 across the four samples; 57% of patients were male. Researchers built two sets of CPMs. Each set predicted four COVID-19-related outcomes:
- Mortality
- Need for mechanical ventilation or intensive care unit (ICU) admission
- Mortality following mechanical ventilation
- Length of stay in the ICU
Researchers grouped data from patients by wave and location. The first set of models, called Northwell COVID-19 Survival (NOCOS) models, used first-wave data from New York City. The second set, called COVID Outcome Prediction in the Emergency Department (COPE) models, used first-wave data from the Netherlands. Researchers then evaluated the CPMs using data from the second wave or data from the other location.
To develop NOCOS models, researchers used least absolute shrinkage and selection operator (LASSO) regression and logistic regression. For COPE models, researchers used logistic regressions. They then evaluated CPMs using a framework developed in a previous PCORI project to assess discrimination, calibration, and net benefit.
Clinicians, patients, and their caregivers gave input about their concerns and the usefulness of the CPMs.
Results
NOCOS and COPE had satisfactory discrimination in predicting mortality using data from different locations and times. NOCOS systematically overestimated mortality risk. COPE had excellent calibration across time but overpredicted mortality risk across location. Recalibration of NOCOS and COPE led to substantial net benefit improvement in New York City data.
NOCOS and COPE had high discrimination in identifying patients needing mechanical ventilation or ICU admission. NOCOS had good calibration, but COPE overestimated the need for mechanical ventilation or ICU admission.
Both NOCOS and COPE performed poorly in predicting mortality after mechanical ventilation and length of stay in the ICU.
Limitations
Because COVID-19, the public health response to COVID-19, and vaccine availability continue to change, results may not apply to patients diagnosed in later COVID-19 waves.
Conclusions and Relevance
Both NOCOS and COPE identified patients at risk for mortality or needing intensive care. COPE predicted mortality risk better than NOCOS, but NOCOS was better at predicting the need for treatment. CPMs require frequent updating to ensure accuracy of results across different locations and over time.
Peer Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers expressed concern about the amount of missing data in this study and asked the researchers to describe the limitations associated with this level of missingness. The researchers acknowledged that some predictor characteristics were often missing in the retrospective COVID-19 database from the Netherlands. They added a discussion to the report of the potential loss of precision in predictor effect estimates as well as the potential for bias related to questionable use of the missing-at-random assumption related to imputation of the missing data. The researchers still considered the available information to be sufficiently robust for precise prediction estimates, especially given how common missing data are for these types of studies.
- The reviewers also asked the researchers to describe how clinical prediction models could be used in clinical practice with individuals who have missing data on some of the prediction variables. The researchers stated that the use of such models for individuals with missing data is outside the scope of this paper but also that the values used in their prediction model could usually be obtained from individuals in the emergency department. They went on to say that a common approach to dealing with missing information in the electronic health record is to model the missing variables based on other available data, but this approach could result in biased results.
- The reviewers suggested some alternative prediction models that the researchers could use in future research. The researchers agreed about potential future analyses based on research questions that came out of this study, including looking at the incremental value of each predictor variable to the prediction model and like testing the models on a second round of data.
Final Enhancement Report
View this COVID-19 study's final enhancement report.
DOI - Digital Object Identifier: 10.25302/03.2023.ME.160635555-C19
Final Research Report
View this project's final research report.
Journal Citations
Related Journal Citations
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers were generally laudatory in their comments about this study and the potential for the prediction modeling methods.
- The reviewers asked the researchers to explain their use of the term harm in relation to clinical prediction models, because harm in clinical research typically refers to adverse patient outcomes. The researchers defined harm as a negative result from their decision curve analyses, indicating that using the clinical prediction model to make treatment decisions had net harm rather than net benefit.
Conflict of Interest Disclosures
Project Information
Patient / Caregiver Partners
- Bray Patrick-Lake MFS, Director, Stakeholder Engagement, CTTI; Director, Patient Engagement, Duke CTSA; Co-chair, NIH Advisory Committee, Director Working Group, Precision Medicine Initiative
Other Stakeholder Partners
- Gary S. Collins, PhD, Deputy Director, Centre for Statistics in Medicine, University of Oxford
- William H. Crown, PhD, Chief Scientific Officer, OptumLabs
- John K. Cuddeback, MD, PhD, Chief Medical Informatics Officer, Anceta Data Warehouse, AMGA
- Dana G. Safran, ScD, Senior Vice President, Performance Measurement and Improvement BCBS of MA
- John A. Spertus, MD, MPH, Chief Medical Officer/Director, Health Outcomes Sciences LLC; Cardiologist, Professor of Medicine, Mid Amerca Heart Institute, U of Missouri-Kansas City; Deputy Editor, Circ Cardiovasc Qual Outcomes
- James E. Udelson, MD, Chief of Cardiology, Tufts Medical Center; Professor of Medicine, Radiology, Tufts University School of Medicine