Results Summary
What was the project about?
A randomized controlled trial, or RCT, is often the best way to learn if one treatment works better than another. RCTs assign patients to different treatments by chance. But RCTs are not always feasible. In such cases, researchers can use observational studies. In observational studies, researchers look at what happens when patients and their doctors choose the treatments. Traits such as age, gender, or health status may affect treatment choices. These traits may also affect patients’ health, making it hard to know if changes in patients’ health are due to treatment or to patient traits.
To figure out whether changes in patients’ health result from treatment or something else, researchers use statistical methods. Two of these methods are:
- Propensity score, or PS. PS methods compare the health of patients who have similar measured traits but received different treatments. These traits are in patient health records.
- Instrumental variable, or IV. IV methods account for things that may affect treatment choice and patients’ health but aren’t in the patients’ health records, such as personal preference about treatment.
But existing PS and IV methods don’t work well when data sets include a lot of traits and health conditions for each patient. Such data sets are called high-dimensional data.
In this study, the research team created and tested one PS method and one IV method for use with high-dimensional data.
What did the research team do?
First, the research team created the two new methods for use with high-dimensional data. The team then used a computer program to create test data that look like real patient data. The team applied the new methods to the test data.
Next the research team applied the new methods to real data from previous studies. They applied the PS method to data from a study that looked at a medical test for measuring how well a person’s heart is pumping. The team applied the IV method to data from a study that looked at the effect of education on personal earnings.
Using both test and real data, the research team compared findings from the new methods with those from existing PS and IV methods. They checked to see if findings from the new methods were accurate when they included different patient traits and health conditions in the analysis.
What were the results?
Compared with existing methods, the new methods led to more accurate results, even when including a variety of patient traits or health conditions in the analysis. The research team also created a computer program called RCAL that applies the methods in the R statistical software.
What were the limits of the project?
The research team created and tested the new methods for studies that look at patients’ health at one point in time. The methods may not apply to studies that look at patients’ health over time.
How can people use the results?
Researchers can use the new methods to measure treatment effects when analyzing high-dimensional data in observational studies.
Professional Abstract
Background
Randomized controlled trials are often the best way to determine whether differences between patient health outcomes are due to treatments. However, random assignment is not always feasible or ethical. In observational studies, researchers use statistical methods to mimic random treatment assignment. Two such methods are:
- Propensity scores (PS). Researchers use statistical methods to create two groups of patients with similar characteristics who received or did not receive the treatment. Researchers then compare health outcomes between the two groups.
- Instrumental variables (IV). Researchers divide the study sample by whether a patient has a characteristic that affects the treatment choice but does not directly affect the outcome, much like randomizing treatment.
PS and IV methods account for confounders that could affect both health outcomes and treatment choices. However, they have limitations, especially when using high-dimensional data. High-dimensional data have numerous variables or many nonlinear and interaction terms for a moderate number of variables.
To use PS and IV methods with high-dimensional data, researchers make ad hoc choices about which variables and nonlinear and interaction terms to include. These choices may lead to model misspecification. Model misspecification can occur when statistical methods do not account for all confounders, which results in biased or imprecise estimates. PS and IV methods that account for model misspecification may estimate treatment effects more accurately and reliably when using high-dimensional data.
Objective
To develop and test a new set of PS and IV methods that account for model misspecification when estimating causal effects of treatments using high-dimensional data
Study Design
Design Element | Description |
---|---|
Design | Theoretical development; simulation studies; empirical analysis |
Data Sources and Data Sets |
|
Analytic Approach |
|
Outcomes |
Bias; variance; coverage proportions of confidence intervals |
Methods
The research team developed a PS method and an IV method for use with high-dimensional data that account for model misspecification. The PS method estimates treatment effects in the absence of unmeasured confounders. The IV method estimates treatment effects when the data do not include all confounders.
The research team compared the new and existing methods using simulation and empirical analyses with varying degrees of model misspecification. To empirically test the new PS method, the team used data from a medical study about the effects of right heart catheterization. The team tested the IV method with survey data to estimate the causal effect of education on earnings.
Results
In simulation analysis, the new methods led to lower bias and more accurate coverage proportions in confidence intervals than existing methods when statistical models were misspecified. In empirical analysis, the magnitudes of treatment effect estimates varied across the new and existing PS and IV methods, but estimates from the new methods had lower standard errors than those from existing methods.
The research team developed RCAL, a computer program, to implement the new methods in R statistical software.
Limitations
The new methods are for cross-sectional observational studies and may not apply to longitudinal or survival studies that examine patient outcomes over time.
Conclusions and Relevance
Researchers can use the new methods to estimate treatment effects when implementing PS or IV analysis with high-dimensional data.
Future Research Needs
Future research could extend the methods to analyze longitudinal and survival data.
Final Research Report
View this project's final research report.
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers were generally laudatory about this methods-focused research project extending current statistical methods for causal inference when comparing two groups that were not randomly assigned.
- Reviewers did question the researchers’ focus on linear models for predicting the effect of an intervention on treatment outcomes. They pointed out that in typical medical settings the relationship between the intervention and the outcomes is unlikely to be linear because of the effects of unmeasured variables on intervention effectiveness, outcome measurement, and other factors. The researchers explained that their use of generalized linear regression techniques in their models could capture certain aspects of nonlinearity in the relationship of two or more variables; extensions to models that are nonlinear in the parameters could be a subject for future investigations.
- Reviewers noted that the simulation models do not appear to assess how violations in instrumental variable assumptions and the weakness of an instrumental variable affect study results. In particular, reviewers were concerned that weak instrumental variables (having low correlation with the intervention) that may be residually correlated with the outcome could increase bias in treatment effects. The researchers added language to the report to indicate that they would investigate weak instrumental variables in future research because it was beyond the scope of this project.