Results Summary
What was the project about?
Patients may respond differently to the same treatment due to individual traits such as age or gender. Knowing how different traits can affect a patient’s response to treatment can help doctors and patients make better treatment decisions. For example, this information can help doctors know what types of cancer medicines work better for certain patients. This project focuses on improving the methods that researchers use to compare how treatments work for different patients.
In this project, the research team developed and tested a statistical method called random forests, or RF. RF is a way to analyze data using a technique called machine learning. In machine learning, computers use data to learn how to perform different tasks with little or no human input. Many types of RF methods exist. The team compared multiple RF methods to learn how well the methods would work to find out how patients with different traits respond to the same treatment.
What did the research team do?
The research team tested the different RF methods using data created with a computer program. They compared the RF methods with each other, with non-RF methods, and with methods not based on machine learning to see which were more accurate and precise. The team also tested the methods using real data.
A group of doctors, other researchers, and staff from a public health department helped design the study.
What were the results?
The research team found that the RF methods worked better than methods not based on machine learning to show how different patients respond to treatments. Some RF methods were more accurate and precise than others. In some cases, non-RF methods worked as well as the RF methods.
What were the limits of the project?
The research team only looked at one type of machine learning method. Future research could look at machine learning methods not used in the study and test their use in treatment decisions in real-world settings.
How can people use the results?
Researchers can use RF methods in clinical research to learn how patients with different traits may respond to treatments.
Professional Abstract
Background
Heterogeneity of treatment effect (HTE) refers to observed variation in patient responses to the same treatment. HTE may reflect differences in patient characteristics such as gender, health behavior, or comorbidity. Improved understanding of HTE can help clinicians and patients make more informed, individualized treatment decisions.
Researchers can assess HTE by estimating the treatment effect for each patient, also known as the individual treatment effect. Machine learning methods, statistical approaches that automate the detection of relationships among variables, can estimate individual treatment effects. This study evaluated one type of machine learning technique known as random forests (RF) for estimating these effects.
Objective
To develop and evaluate RF methods for estimating individual treatment effects
Study Design
Design Elements | Description |
---|---|
Design | Simulation studies, empirical analysis |
Data Sources and Data Sets | Simulated data, observational data |
Analytic Approach |
Comparing multiple RF methods with other statistical methods in estimating individual treatment effects, including comparing
|
Performance of the different RF methods using mean bias, RMSE |
Methods
Researchers tested multiple RF methods in four studies, including three studies using simulated data sets and one study using empirical data. In the first study using data that simulated randomized controlled trials, they compared RF to other methods of estimating individual treatment effects, including multiple imputation and Bayesian Additive Regression Trees (BART). In the remaining studies, they tested the performance of various RF methods in scenarios varying the magnitude of HTE in randomized trials, using potentially confounded observational data, and accommodating survival outcomes and multiple treatment comparisons.
An advisory board made up of public health department representatives, infectious disease doctors, and community researchers provided input during the study.
Results
Researchers found that
- In simulated randomized trials, RF and BART had less mean bias than multiple imputation while BART had the smallest root mean squared error (RMSE) in estimating individual treatment effects.
- In simulated randomized trials, RF methods performed worse as the magnitude of HTE increased. Compared with counterfactual RF, synthetic counterfactual RF estimates had less bias and smaller RMSE.
- In simulated observational studies, among several RF approaches, synthetic counterfactual RF had the least bias and smallest RMSE.
- Using empirical data, researchers were able to use RF with survival outcomes and multiple treatment groups to estimate individual treatment effects. The multivariate RF method had the least bias, compared with both the RF classification with calibration method and the RF distance with calibration method.
Limitations
The study included only tree-based machine learning approaches and did not explore other machine learning approaches. Researchers did not address uncertainty related to the individual treatment effect estimates.
Conclusions and Relevance
RF methods can be used to estimate individual treatment effects with most types of data, especially data with large numbers of predictors. Synthetic RF performs better than the standard RF method. RF methods can work with survival outcomes and with multiple different treatments. In some situations where the sample size is moderate, other methods such as BART can perform as well or better than RF.
Future Research Needs
Future research could examine the data conditions under which specific RF approaches should be used. Studies could also compare these RF methods with other non-tree-based machine learning approaches and test their use in treatment decisions in clinical settings.
Final Research Report
View this project's final research report.
Journal Citations
Related Journal Citations
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers asked for more information about where, how, and what type of empirical data the researchersused for testing the analytical methods. The reviewers asked that the report include a description of the patients, their outcomes, and follow-up. The researchers added a brief description about the data collected from a prospective cohort of patients from the Cleveland Clinic between 1997 and 2007, including a reference to the publication that documents these details.
- The reviewers asked the researchers to explain how they felt that absolute values led to greater fidelity in discovering treatment effects, when these can also be biased in observational data. The researchers explained that at the time of writing they had meant that using absolute values could lead to more success in identifying subgroups with differential responses. However, by the end of the study they no longer agreed with this argument, so they removed the reference to absolute values from the report. The researchers noted that they had instead moved from finding subgroups to getting good individual-specific estimates of treatment response.
- The reviewers asked for a fuller discussion of individual uncertainty around the individual treatment effect estimate, indicating that the range of uncertainty was quite large in practice due to small numbers of observations. The researchers acknowledged that this was a major limitation of the study and expanded their discussion on this with references to recently published approaches for assessing uncertainty in individual treatment effect estimates.