Results Summary
What was the project about?
Patients with high healthcare needs include those with multiple health problems, like diabetes and heart disease. Current methods to identify patients with high healthcare needs aren’t always accurate. These methods may miss some patients such as those with mental illness or social risks like unstable housing. Better methods could help doctors make sure these patients are getting the care they need.
In this study, the research team developed a method to predict which patients may have high healthcare needs.
What did the research team do?
First, the research team created a new system for identifying patients who have high healthcare needs. The team organized patients into 10 groups based on different patient traits, health problems, and risks, such as chronic pain, mental illness, and unemployment. The team then tested the system to see if it could identify patients with high healthcare needs. To do so, the team used data from 428,024 patients receiving care at New York City health centers. They also used data about the neighborhoods where patients lived. All patients had Medicare insurance.
Using the 10 groups, the research team created a statistical method to predict if a patient would have high healthcare needs. The team checked how different groups and other patient data, like age and gender, improved how accurate the method was. They tested the method with data from 1,074,389 patients receiving care from New York City health centers from 2013 to 2016. The team confirmed the results using data from health systems in Florida.
Patients, caregivers, doctors, and health system administrators helped design the study.
What were the results?
The new grouping system identified 99 percent of patients with high healthcare needs.
The statistical method predicted which patients were likely to have high healthcare needs. The method was most accurate when it had patient data on all 10 groups plus age and gender.
What were the limits of the project?
The research team used data from patients with Medicare insurance in two states. The results may differ for patients with other types of insurance or from other locations.
Future studies could test the method using data from patients with other types of insurance or in other regions.
How can people use the results?
Researchers and doctors can use the methods to help make sure patients with high healthcare needs get the care they need.
How this project fits under PCORI’s Research Priorities The PCORnet® Study reported in this results summary was conducted using PCORnet®, the National Patient-Centered Clinical Research Network. PCORnet® is intended to improve the nation’s capacity to conduct health research, particularly comparative effectiveness research (CER), efficiently by creating a large, highly representative network for conducting clinical outcomes research. PCORnet® has been developed with funding from the Patient-Centered Outcomes Research Institute® (PCORI®). |
Professional Abstract
Background
Identifying which patients have high healthcare needs and utilization is important for providing patients with personalized care and reducing avoidable hospitalizations. But existing methods do not fully capture the range and variation in patient characteristics that may be associated with higher needs and healthcare use. For example, most methods do not identify patients with overlapping health problems or complex health and social needs. A better classification system, or taxonomy, and statistical models can help clinicians identify and predict which patients may have future high healthcare needs.
Objective
To develop and validate a statistical model based on a taxonomy of patients with high healthcare needs and utilization to predict future healthcare needs and utilization
Study Design
Design Element | Description |
---|---|
Design | Empirical analysis |
Population |
|
Analytic Approach | Prediction model development using logistic regression |
Outcomes | Predicting risk of greater healthcare needs and utilization; model performance using the area under the receiver operating characteristic curve (AUC), Brier score, and calibration plots |
Methods
First, using data from prior research, investigators developed a taxonomy of 10 overlapping categories to classify patients with high healthcare needs and utilization. Researchers looked at whether the categories could be identified using data on clinical diagnoses and healthcare utilization from claims and electronic health records (EHRs) for 428,024 patients with Medicare insurance from the INSIGHT PCORnet® Clinical Research Network in New York City. Researchers also added data from the American Community Survey on the socioeconomic status of patients’ neighborhoods, including income, education, employment, and housing quality.
Then researchers used the 10 taxonomy categories as predictor variables in statistical models to determine a patient’s risk for increased future healthcare needs and utilization. To develop and test different prediction models, they varied the combination of taxonomy categories and other types of variables, like demographics and social conditions. Researchers then tested the prediction models using longitudinal data from 2013 to 2016 for 1,074,389 patients with Medicare from INSIGHT. They assessed the models’ accuracy and performance using logistic regression and machine learning approaches. They then validated the models using data from the OneFlorida PCORnet Clinical Research Network.
Patients, caregivers, clinicians, and health system administrators helped design the study.
Results
Using claims and EHR data, researchers found that the 10 taxonomy categories identified 99% of all patients with high healthcare needs and utilization.
Among the different statistical prediction models tested, the logistic regression-based prediction model with all 10 taxonomy categories and demographics performed the best, with good discrimination (AUC 0.71–0.77) and accuracy (Brier score 0.19–0.21). The model correctly predicted patients with high healthcare needs and utilization in each year. Adding more predictive variables to the model did not improve accuracy.
Limitations
Researchers used data from Medicare claims in New York City and Florida to develop the taxonomy and predictive models. Results may differ with data from patients in other locations or with other types of insurance.
Conclusions and Relevance
The statistical model helped predict which patients had high healthcare needs and utilization. Health systems could use the model to help personalize care for these patients.
Future Research Needs
Future research could test the statistical model’s performance using other clinical data such as lab test results or data for patients who live in other regions or have other types of insurance.
How this project fits under PCORI’s Research Priorities The PCORnet® Study reported in this results summary was conducted using PCORnet®, the National Patient-Centered Clinical Research Network. PCORnet® is intended to improve the nation’s capacity to conduct health research, particularly comparative effectiveness research (CER), efficiently by creating a large, highly representative network for conducting clinical outcomes research. PCORnet® has been developed with funding from the Patient-Centered Outcomes Research Institute® (PCORI®). |
COVID-19-Related Study
Developing Models to Predict Outcomes for Patients with COVID-19 -- A PCORnet® Study
Results Summary
In response to the COVID-19 public health crisis in 2020, PCORI launched an initiative to enhance existing research projects so that they could offer findings related to COVID-19. The initiative funded this study and others.
What was this COVID-19 study about?
COVID-19 is a viral illness that can cause severe health problems and death. Doctors and patients are still learning about the best ways to treat this illness. One intervention for patients with severe COVID-19 is intubation. With intubation, a doctor puts a tube down the patient’s throat to open their airway and help them breathe. Doctors don’t always know when to recommend intubation because they don’t know which patients will benefit from it.
Statistical models can predict a patient’s risk for a specific event, such as a health problem, adverse effect, or even death. They may help doctors figure out which patients may improve with intubation and which patients may die from an illness. But current models were created using small studies and may not work for all patients with COVID-19. Better ones may help doctors and patients know which treatments may be best for them.
In this project, the research team wanted to develop models to predict:
- Which patients will benefit from intubation
- Which patients are at high risk for dying from COVID-19
What were the results?
The research team used data from 30,016 patients with COVID-19. Of these, 18,762 were admitted to the hospital, 2,902 were intubated and 3,554 died.
All three models helped predict which patients were more likely to benefit from intubation and which patients were more likely to die. Accounting for changes in the effect of personal traits over time improved the accuracy of the models. Adding data on neighborhood social conditions did not improve the models.
Who was in the study?
The data used in the research were from electronic health records for patients with COVID-19 who were admitted to emergency rooms or hospitals in New York City between March 1, 2020, and February 8, 2021. Among the patients, 64 percent were non-White, 28 percent were White, and 8 percent were other or unknown race; 36 percent were Hispanic. The median age was 60, and 51 percent were men.
What did the research team do?
The research team used the data to develop three prediction models. The team looked at data on patient traits, such as age, gender, race, other health conditions, and health status. They added data on patients’ neighborhood social conditions, like employment rates and home and car ownership. They also looked at whether the effect of personal traits changed over time.
What were the limits of the study?
The research team used data from patients in New York City during the first two waves of COVID-19. The results may differ for patients in other locations or for new waves of COVID-19.
How can people use the results?
Doctors can use the models when considering treatments for patients with severe COVID-19.
How this project fits under PCORI’s Research Priorities The PCORnet® Study reported in this results summary was conducted using PCORnet®, the National Patient-Centered Clinical Research Network. PCORnet® is intended to improve the nation’s capacity to conduct health research, particularly comparative effectiveness research (CER), efficiently by creating a large, highly representative network for conducting clinical outcomes research. PCORnet® has been developed with funding from the Patient-Centered Outcomes Research Institute® (PCORI®). |
Professional Abstract
In response to the COVID-19 public health crisis in 2020, PCORI launched an initiative to enhance existing research projects so that they could offer findings related to COVID-19. The initiative funded this study and others.
Background
Because COVID-19 is a new illness, little evidence is available to inform patients’ and clinicians’ care decisions. For example, clinicians may not know when to recommend intubation for patients with severe COVID-19 because they do not know which patients will benefit from intubation.
Statistical methods such as predictive modeling can account for factors that affect patient health outcomes and can guide decision making. But most existing models used to predict COVID-19-related outcomes are not generalizable to all patients because they are based on small samples, include data from a select group of patients, or do not consider post-discharge data. Better predictive models could help clinicians identify patients who are at high risk for death from COVID-19 and inform treatment options.
Objective
To develop models that predict the risk for intubation and mortality among hospitalized patients with COVID-19
Study Design
Design Element | Description |
---|---|
Design | Observational: retrospective cohort study |
Population | INSIGHT Clinical Research Network EHR data from 30,016 patients ages 18 and older with COVID-19 who were admitted to emergency departments or hospitals in New York City |
Outcomes |
Patient outcomes: Intubation and mortality due to COVID-19 Model performance outcome: AUROC curve |
Data Collection Timeframe | March 1, 2020–February 8, 2021 |
To develop alternative models to predict intubation and mortality risk among patients with COVID-19, researchers used three methods: logistic regression, random forests, and Classification and Regression Trees (CART). To assess model performance, researchers compared the area under the receiver operating characteristic (AUROC) curves across different prediction specifications and methods.
Researchers used electronic health record (EHR) data to identify risk factors for intubation and mortality such as age, sex, chronic health conditions, and vital signs. They included data on neighborhood characteristics, such as rates of employment and home ownership, in the predictive models. They also examined how the effect of the factors on outcomes changed over time.
The study used EHR data from 30,016 patients with COVID-19 who were receiving care at one of five health systems in New York City. Among patients, 64% were non-White, 28% were White, and 8% were other or unknown race; 36% were Hispanic. The median age was 60, and 51% were male.
Clinician researchers treating patients with COVID-19 helped design the study.
Results
Of the patients in the study, 18,762 were hospitalized, 2,902 were intubated, and 3,554 died during the study. The models using logistic regression and random forests performed better and had better prediction accuracy for these results than the model using CART. The AUROC for the models predicting intubation was 0.66–0.74 for logistic regression, 0.66–0.73 for random forests, and 0.53–0.54 for CART. The AUROC for predicting mortality ranged from 0.78–0.85 for logistic regression, 0.79–0.85 for random forests, and 0.64–0.68 for CART.
Accounting for changes in the effect of patient factors over time improved prediction accuracy across models. Including neighborhood characteristics did not improve prediction accuracy.
Limitations
Researchers used data from patients in New York City who were infected with the Alpha variant of COVID-19. Findings may not be generalizable to other locations or to patients who are infected with new variants if the novel coronavirus continues to evolve.
Conclusions and Relevance
Clinicians can use the predictive models to identify patients with COVID-19 who are likely to benefit from intubation and patients who are at high risk for death. The results may help inform treatment decisions.
How this project fits under PCORI’s Research Priorities The PCORnet® Study reported in this results summary was conducted using PCORnet®, the National Patient-Centered Clinical Research Network. PCORnet® is intended to improve the nation’s capacity to conduct health research, particularly comparative effectiveness research (CER), efficiently by creating a large, highly representative network for conducting clinical outcomes research. PCORnet® has been developed with funding from the Patient-Centered Outcomes Research Institute® (PCORI®). |
Peer Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers asked whether the researchers included different treatments that were used in the early days of the COVID-19 pandemic as predictors in their analytic models or whether the researchers used time periods as a proxy for changing clinical responses to COVID-19. The researchers responded that in the early days of the pandemic, treatment options were quite limited, but they included those early time periods to demonstrate the change in risk of severe outcomes that occurred due to availability of treatments, increasing familiarity with the illness, and availability of hospital beds. The researchers also acknowledged in the report that their data were collected before vaccines were widely available, so their models should be reevaluated in vaccinated cohorts.
- The reviewers expressed concern that the report did not include enough information about how the researchers handled missing data despite reporting missing values to range between 1% and 70%. The reviewers were particularly concerned about the handling of missing predictor data as that could result in a high risk of bias in the analyses. The researchers noted that the rate of missing data among the predictor variables was 3.6% to 12%. They also added information to the report about the imputations they conducted to account for the missing data.
- The reviewers asked the researchers to clarify whether they captured only deaths occurring while hospitalized for COVID-19 when the report referred to 60-day mortality. The researchers indicated that 95% of deaths in their data did occur while patients were in the hospital, so they did use in-hospital mortality in their analyses. They acknowledged in the report that deaths occurring in the community or in other health systems might have been underreported since most participating systems did not count these deaths.
Final Enhancement Report
View this COVID-19 study's final enhancement report.
DOI - Digital Object Identifier: 10.25302/01.2023.HSD.160435187_C19
Final Research Report
View this project's final research report.
Journal Citations
Related Journal Citations
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers pointed out that there is a distinction between patients with high health needs and patients with high health costs, because research has demonstrated that patients from underrepresented minority groups often face discrimination and less access to health care, so high-needs patients from these communities may not have high healthcare costs. They recommended that the researchers instead use a proxy of number of chronic conditions to identify high-needs patients rather than costs. The researchers agreed with this consideration and analyzed the data based on this premise to determine whether their results remained consistent. They compared the number of chronic conditions by race and socioeconomic status for each level of predicted cost-based patient risk, and found no biases based on race or socioeconomic status in their results. However, they acknowledged that the databases they used to collect health needs may themselves have biased information because the data may be incomplete for patients from disadvantaged groups with less access to care.
- The reviewers noted the poor performance of the machine-learning models compared to logistic regression, in predicting healthcare costs. The researchers agreed that the machine-learning models did not perform well and agreed with reviewers that this was probably due to them using only 10 predictor variables in calculating those models via machine learning.
- The reviewers asked the researchers to discuss their proposed approach to predicting high-cost patients in comparison to other prediction methods that the researchers mentioned in the background section of the report. The researchers pointed out that they could not readily compare their methods to some of the other prediction methods because of different definitions for what percent of annual health spending constituted high cost, different populations of patients, and different goals for developing the taxonomy predicting high-cost patients.