PCORI funded the Pilot Projects to explore how to conduct and use patient-centered outcomes research in ways that can better serve patients and the healthcare community. Learn more.
Determining the best treatment for an individual is a fundamentally different problem from determining the best treatment, on average, in a population because individuals vary from each other in important ways that influence the risks and benefits of treatment. This idea is known as heterogeneity of treatment effect (HTE). Although enormous efforts have gone into conducting randomized controlled trials (RCTs) for medical interventions, there has been remarkably little research into how best to use trial results to ensure that as many individual patients as possible receive the most appropriate treatment given their personal characteristics.
Researchers previously proposed a framework for the analysis of HTE. This framework prioritizes the analysis and reporting of multivariate risk-based HTE in the form of predictive risk models (e.g., risk scores, risk calculators). In addition, the researchers proposed that other subgroup analyses should be explicitly labeled either as primary (confirmatory) subgroup analyses or secondary (exploratory) subgroup analyses. The overall objective of this pilot project was to evaluate the feasibility and value of this framework across a large set of RCTs.
Reanalysis of RCTs
Participants, Interventions, Settings, and Outcomes
Not applicable because the RCTs included a wide range of participants, interventions, settings, and outcomes.
Researchers used publicly available RCTs from the National Heart, Lung, and Blood Institute, National Institute of Diabetes and Digestive and Kidney Diseases, and other sources. Researchers included data from large RCTs that enrolled at least 1,000 participants, randomized to at least two treatment groups, and had an event or time-to-event clinical outcome.
Using a set of 32 large trials, researchers developed risk models for the primary outcome of each trial. This method was informed by a set of risk models identified in the literature and matched to the trial. These models were then used to stratify trial participants by their risk of the primary outcome.
To describe and quantify risk heterogeneity, researchers used several novel but intuitive measures. Extreme quartile risk ratio (EQRR) describes the ratio of the outcome rates between the highest risk quartile of patients and the lowest risk quartile. Median-to-mean risk ratio (MMRR) is a measure of skewness calculated as the ratio of the median predicted outcome probability to the mean predicted outcome probability. As the MMRR deviates from one, it reflects the degree to which the summary (average) result may not reflect the effects in the “typical” patient in the trial.
Researchers then analyzed the effect of risk on treatment effect, estimated on relative and absolute scales (i.e., relative and absolute risk reduction). HTE was evaluated by testing for an interaction between predicted risk and treatment assignment, a statistical test for the contrast of the treatment effect on the relative risk (or odds ratio) scale. Researchers also compared relative and absolute risk reduction (ARR) in the extreme risk quartiles.
To verify that the use of internal models would not bias estimates of HTE across risk groups, researchers performed a series of simulations. Across a range of scenarios, analyses based on internal risk models developed on trial participants yield results similar to analyses based on external risk models developed on nonparticipants sampled from the same super-population. To minimize missing data bias, multiple imputation was used when restricting to patients without missing data (i.e., a complete case analysis) would exclude more than 5% of trial participants. To prevent model overfitting, researchers limited the number of variables included in models to maintain an events-per-variable ratio of >20 and used bootstrapping to correct for optimism.
Researchers identified 32 RCTs that met inclusion criteria, most in the field of cardiovascular disease. Although the typical (median) predicted EQRR was approximately 4, more than a quarter of all trials had an EQRR above 5 and the range exceeded 30. Similarly, although the median MMRR was 0.86 (indicating that the typical patient exhibited an outcome risk 14% lower than the summary [average] risk), this figure ranged from 0.4 to 1. These metrics may demonstrate that risks—defined on the basis of easily obtainable clinical characteristics—often vary tremendously across trial participants, and that the typical patient is usually at lower risk than the average, because most outcomes are accounted for by a group of high risk patients that skew the data.
In the 13 trials with statistically significant results, 18 unique treatment comparisons were analyzed. Overall, there was no apparent relationship between baseline predicted risk and the hazard (or odds) ratios of treatment across trials. Only one of 18 analyses had a significant interaction between treatment and the predicted risk on the proportional scale. ARR varied substantially and was generally higher in high-risk strata. The difference in the ARR between the extreme risk quartiles ranged from 3.2% to 28.3% (median = 5.1%; interquartile range = 0.3–10.9). Thus, in a quarter of analyses, patients in the fourth quartile had an ARR that was at least 10% higher than in the first quartile.
The results of several analyses (Medical Therapy of Prostatic Symptoms, Diabetes Prevention Program, and the Digitalis Investigation Group trial) were so striking that separate clinical manuscripts were published to report the HTE. Each showed that all or most of the benefits of the tested interventions could be obtained by targeting only the high-risk patients (top quartile), without the costs/harms that come with population-wide therapy across all quartiles of predicted risk.
Heterogeneity may be slightly overestimated from model overfitting or underestimated from underfitting (i.e., omitting clinically important predictors); more careful model building—exploring nonlinearity and interactions—may have increased the estimation of risk heterogeneity. Researchers did not explore nonlinear relationships between predicted risk and treatment effects, which may have revealed additional HTE.
Clinically significant risk heterogeneity is common in trials. The typical patient may generally be at lower risk than that reflected by the trial summary results. Clinically significant differences in absolute treatment effects within trials appear common and would otherwise be obscured if trials were analyzed in the conventional way. A risk-stratified approach to trial analysis is feasible and may be most clinically informative when the outcome is predictable and uncommon.