Developing and Testing New Methods for Estimating Treatment Effectiveness in Observational Studies Using High-Dimensional Data

Results Summary
Professional Abstract

Results Summary

Download Summary

What was the project about?

A randomized controlled trial, or RCT, is often the best way to learn if one treatment works better than another. RCTs assign patients to different treatments by chance. But RCTs are not always feasible. In such cases, researchers can use observational studies. In observational studies, researchers look at what happens when patients and their doctors choose the treatments. Traits such as age, gender, or health status may affect treatment choices. These traits may also affect patients’ health, making it hard to know if changes in patients’ health are due to treatment or to patient traits.

To figure out whether changes in patients’ health result from treatment or something else, researchers use statistical methods. Two of these methods are:

Propensity score, or PS. PS methods compare the health of patients who have similar measured traits but received different treatments. These traits are in patient health records.
Instrumental variable, or IV. IV methods account for things that may affect treatment choice and patients’ health but aren’t in the patients’ health records, such as personal preference about treatment.

But existing PS and IV methods don’t work well when data sets include a lot of traits and health conditions for each patient. Such data sets are called high-dimensional data.

In this study, the research team created and tested one PS method and one IV method for use with high-dimensional data.

What did the research team do?

First, the research team created the two new methods for use with high-dimensional data. The team then used a computer program to create test data that look like real patient data. The team applied the new methods to the test data.

Next the research team applied the new methods to real data from previous studies. They applied the PS method to data from a study that looked at a medical test for measuring how well a person’s heart is pumping. The team applied the IV method to data from a study that looked at the effect of education on personal earnings.

Using both test and real data, the research team compared findings from the new methods with those from existing PS and IV methods. They checked to see if findings from the new methods were accurate when they included different patient traits and health conditions in the analysis.

What were the results?

Compared with existing methods, the new methods led to more accurate results, even when including a variety of patient traits or health conditions in the analysis. The research team also created a computer program called RCAL that applies the methods in the R statistical software.

What were the limits of the project?

The research team created and tested the new methods for studies that look at patients’ health at one point in time. The methods may not apply to studies that look at patients’ health over time.

How can people use the results?

Researchers can use the new methods to measure treatment effects when analyzing high-dimensional data in observational studies.

Professional Abstract

Background

Randomized controlled trials are often the best way to determine whether differences between patient health outcomes are due to treatments. However, random assignment is not always feasible or ethical. In observational studies, researchers use statistical methods to mimic random treatment assignment. Two such methods are:

Propensity scores (PS). Researchers use statistical methods to create two groups of patients with similar characteristics who received or did not receive the treatment. Researchers then compare health outcomes between the two groups.
Instrumental variables (IV). Researchers divide the study sample by whether a patient has a characteristic that affects the treatment choice but does not directly affect the outcome, much like randomizing treatment.

PS and IV methods account for confounders that could affect both health outcomes and treatment choices. However, they have limitations, especially when using high-dimensional data. High-dimensional data have numerous variables or many nonlinear and interaction terms for a moderate number of variables.

To use PS and IV methods with high-dimensional data, researchers make ad hoc choices about which variables and nonlinear and interaction terms to include. These choices may lead to model misspecification. Model misspecification can occur when statistical methods do not account for all confounders, which results in biased or imprecise estimates. PS and IV methods that account for model misspecification may estimate treatment effects more accurately and reliably when using high-dimensional data.

Objective

To develop and test a new set of PS and IV methods that account for model misspecification when estimating causal effects of treatments using high-dimensional data

Study Design

Design Element	Description
Design	Theoretical development; simulation studies; empirical analysis
Data Sources and Data Sets	Data from Connors AF, Speroff T, Dawson NV, et al. The effectiveness of right heart catheterization in the initial care of critically ill patients. JAMA. 1996;276(11):889-89; N=5,735 National Longitudinal Survey (NLS) of Young Men, N=3,010
Analytic Approach	Developing statistical methods: (1) regularized calibrated estimation for estimating PS in high-dimensional data; (2) model-assisted inference about average treatment effects in PS and IV methods Testing the new methods: Simulation analysis and empirical analysis
Outcomes	Bias; variance; coverage proportions of confidence intervals

Methods

The research team developed a PS method and an IV method for use with high-dimensional data that account for model misspecification. The PS method estimates treatment effects in the absence of unmeasured confounders. The IV method estimates treatment effects when the data do not include all confounders.

The research team compared the new and existing methods using simulation and empirical analyses with varying degrees of model misspecification. To empirically test the new PS method, the team used data from a medical study about the effects of right heart catheterization. The team tested the IV method with survey data to estimate the causal effect of education on earnings.

Results

In simulation analysis, the new methods led to lower bias and more accurate coverage proportions in confidence intervals than existing methods when statistical models were misspecified. In empirical analysis, the magnitudes of treatment effect estimates varied across the new and existing PS and IV methods, but estimates from the new methods had lower standard errors than those from existing methods.

The research team developed RCAL, a computer program, to implement the new methods in R statistical software.

Limitations

The new methods are for cross-sectional observational studies and may not apply to longitudinal or survival studies that examine patient outcomes over time.

Conclusions and Relevance

Researchers can use the new methods to estimate treatment effects when implementing PS or IV analysis with high-dimensional data.

Future Research Needs

Future research could extend the methods to analyze longitudinal and survival data.

Final Research Report

View this project's final research report.

Peer-Review Summary

Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.

The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.

Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:

The reviewers were generally laudatory about this methods-focused research project extending current statistical methods for causal inference when comparing two groups that were not randomly assigned.
Reviewers did question the researchers’ focus on linear models for predicting the effect of an intervention on treatment outcomes. They pointed out that in typical medical settings the relationship between the intervention and the outcomes is unlikely to be linear because of the effects of unmeasured variables on intervention effectiveness, outcome measurement, and other factors. The researchers explained that their use of generalized linear regression techniques in their models could capture certain aspects of nonlinearity in the relationship of two or more variables; extensions to models that are nonlinear in the parameters could be a subject for future investigations.
Reviewers noted that the simulation models do not appear to assess how violations in instrumental variable assumptions and the weakness of an instrumental variable affect study results. In particular, reviewers were concerned that weak instrumental variables (having low correlation with the intervention) that may be residually correlated with the outcome could increase bias in treatment effects. The researchers added language to the report to indicate that they would investigate weak instrumental variables in future research because it was beyond the scope of this project.

Conflict of Interest Disclosures

View the COI Disclosure Form

Project Information

Principal Investigator Principal Investigator The lead researcher and primary contact for the project. View Glossary:

Zhiqiang Tan, PhD

Organization Organization The institution/organization in which the project originates, or the primary institution or organization that received funding for the project. View Glossary:

Rutgers, The State University of New Jersey, New Brunswick

Project Budget:

$675,046

DOI - Digital Object Identifier:

10.25302/11.2021.ME.151132740

Project Title Project Title The original title of the project supplied by the principal investigator or project lead/team. View Glossary:

Improving Causal Inference Methods via Statistical Learning with High-dimensional Data

Key Dates

Approval Date Approval Date The date of approval to fund by PCORI. The actual project start dates vary as the negotiation of project milestones must be completed before the contract can be fully executed. View Glossary:

July 2016

Project End Date Project End Date Includes the research project period and may be subject to modification to allow other research-related activities such as peer review. View Glossary:

November 2021

Year Awarded Year Awarded The year that funding for the project was approved, or the year the proposal received a notice of award. View Glossary:

2016

Year Completed:

2021

Study Registration Information

HSR Project Number:

HSRP20164102

About

Research

Impact

Highlights of PCORI-Funded Research Results

Topics

Engagement

Funding Opportunities

Applicant and Awardee Resources

Events

Jump to Section