Professional Abstract
Objective
To evaluate and improve analytic approaches for variable selection and treatment-effect estimation in nonrandomized studies with few outcome events and many confounders
Study Design
Design Element |
Description |
Design |
2 simulation studies |
Data Sources and
Data Sets |
Simulations based on 3 previously published pharmacoepidemiologic cohorts |
Analytic Approach |
Simulations based on the plasmode framework were used to examine the following approaches:
- Methods for variable selection and confounder adjustment: high-dimensional propensity score, regularized regression (including ridge regression and lasso regression), and combinations of the 2
- Methods for using the propensity score to estimate treatment effects: pair matching, full matching, decile strata, fine strata, regression adjustment using one or two nonlinear splines, inverse propensity weighting, and matching weights
|
Outcomes |
Bias, mean squared error, and precision |
Nonrandomized studies are essential in early identification of promising new treatments, rare diseases, and comparing treatment effects in population subgroups often excluded from randomized trials (e.g., children and older adults). Nonrandomized studies may have few outcome events and numerous confounding variables (i.e., variables associated with both treatment and outcomes). Such nonrandomized studies present significant challenges to drawing causal inferences.
Researchers often use propensity-score (PS) models to control for many measured confounders in estimating the causal effects of treatment. Applying PS models involves two steps. Researchers identify a set of variables or confounders to calculate the PS for each patient. Then, researchers use the PS to estimate treatment effects. Researchers typically apply PS methods in analyzing data with many confounders, but PS methods can be unstable when there are few outcome events. Few studies have explored which PS approaches offer the greatest control for confounding in such scenarios.
Researchers conducted two simulation studies evaluating PS models that have been proposed in the literature. Researchers based the simulations on three previously published cohort datasets using the plasmode framework. The plasmode framework creates realistic simulated datasets that mimic traits found in real nonrandomized cohort studies based on large healthcare datasets.
One simulation compared the high-dimensional propensity score (hdPS) algorithm with regularized regression approaches, such as ridge regression and lasso regression. The hdPS algorithm prioritizes a subset of potential confounders to include in the PS model. However, regularized regression approaches adjust for all potential confounders when modeling the outcomes.
The other simulation compared a variety of PS-based estimators of the treatment effect across different conditions. These conditions included whether treatment effects were heterogeneous, which means a treatment’s effect differed for different patients, or homogeneous, which means a treatment’s effect was the same across patients.
Researchers used bias and mean squared error of the estimated effects to assess performance.
Patient representatives provided input during the study about issues related to nonrandomized research that were important to them.
Results
In the first simulation, the hdPS variable-selection algorithm generally performed better than regularized-regression approaches across conditions. However, using lasso regression for variable selection in a regular PS model also performed well.
In the second simulation, regression adjustment for the PS using one nonlinear spline (a method allowing for nonlinear associations among confounders, PS, and outcomes) and matching weights provided lower bias and mean squared error in the context of rare outcomes. Regression adjustment for the PS using one nonlinear spline provided robust inference when the PS model was misspecified, but it introduced bias when treatment effect was heterogeneous. Matching weights provided robust inference for heterogeneous treatment effect, but the robustness depended on correct specification of the PS model. Therefore, researchers should evaluate their data to determine whether treatment effect is likely to be heterogeneous when choosing which approach to use.
Limitations
The research team used simulated datasets to explore a variety of realistic scenarios. However, the use of simulated datasets that differ from those the research team used could produce different results. Moreover, the simulated datasets may differ from actual data, and the results derived from this study may not apply to all types of data.
Conclusions and Relevance
Automated variable selection methods, such as hdPS and lasso regression, can help build PS models that appropriately adjust for confounding in comparative effectiveness studies using healthcare databases. However, regularized-regression approaches are not appropriate for simultaneously selecting variables and adjusting for confounding via the outcome model. Treatment-effect-estimation approaches that focus on effects in the feasible population while preserving study size and number of outcomes are likely to lead to better estimates of treatment effect than other popular approaches. Applying these findings to the analysis of nonrandomized healthcare datasets should improve information available to support patient-physician decision making.
Future Research Needs
Future research could explore the performance of these analytical approaches when there are important unmeasured confounding variables or when there is uncertainty in model specification. Future work could also focus on improving understanding of how the relative performance of approaches varies as the number of observed outcome events increases. Finally, there is a strong need for evaluating the use of these approaches in survival outcomes.