Results Summary

What was the project about?

A randomized controlled trial, or RCT, is often the best way to learn if one treatment works better than another. RCTs assign patients to different treatments by chance. But RCTs are not always feasible. In such cases, researchers can use observational studies. In observational studies, researchers look at what happens when patients and their doctors choose the treatments. Traits such as age, gender, or health status may affect treatment choices. These traits may also affect patients’ health, making it hard to know if changes in patients’ health are due to treatment or to patient traits.

To figure out whether changes in patients’ health result from treatment or something else, researchers use statistical methods. Two of these methods are:

  • Propensity score, or PS. PS methods compare the health of patients who have similar measured traits but received different treatments. These traits are in patient health records.
  • Instrumental variable, or IV. IV methods account for things that may affect treatment choice and patients’ health but aren’t in the patients’ health records, such as personal preference about treatment.

But existing PS and IV methods don’t work well when data sets include a lot of traits and health conditions for each patient. Such data sets are called high-dimensional data.

In this study, the research team created and tested one PS method and one IV method for use with high-dimensional data.

What did the research team do?

First, the research team created the two new methods for use with high-dimensional data. The team then used a computer program to create test data that look like real patient data. The team applied the new methods to the test data.

Next the research team applied the new methods to real data from previous studies. They applied the PS method to data from a study that looked at a medical test for measuring how well a person’s heart is pumping. The team applied the IV method to data from a study that looked at the effect of education on personal earnings.

Using both test and real data, the research team compared findings from the new methods with those from existing PS and IV methods. They checked to see if findings from the new methods were accurate when they included different patient traits and health conditions in the analysis.

What were the results?

Compared with existing methods, the new methods led to more accurate results, even when including a variety of patient traits or health conditions in the analysis. The research team also created a computer program called RCAL that applies the methods in the R statistical software.

What were the limits of the project?

The research team created and tested the new methods for studies that look at patients’ health at one point in time. The methods may not apply to studies that look at patients’ health over time.

How can people use the results?

Researchers can use the new methods to measure treatment effects when analyzing high-dimensional data in observational studies.

Final Research Report

View this project's final research report.

Peer-Review Summary

Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.

The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments. 

Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:

  • The reviewers were generally laudatory about this methods-focused research project extending current statistical methods for causal inference when comparing two groups that were not randomly assigned.
  • Reviewers did question the researchers’ focus on linear models for predicting the effect of an intervention on treatment outcomes. They pointed out that in typical medical settings the relationship between the intervention and the outcomes is unlikely to be linear because of the effects of unmeasured variables on intervention effectiveness, outcome measurement, and other factors. The researchers explained that their use of generalized linear regression techniques in their models could capture certain aspects of nonlinearity in the relationship of two or more variables; extensions to models that are nonlinear in the parameters could be a subject for future investigations.
  • Reviewers noted that the simulation models do not appear to assess how violations in instrumental variable assumptions and the weakness of an instrumental variable affect study results. In particular, reviewers were concerned that weak instrumental variables (having low correlation with the intervention) that may be residually correlated with the outcome could increase bias in treatment effects. The researchers added language to the report to indicate that they would investigate weak instrumental variables in future research because it was beyond the scope of this project.

Conflict of Interest Disclosures

Project Information

Zhiqiang Tan, PhD
Rutgers, The State University of New Jersey, New Brunswick
Improving Causal Inference Methods via Statistical Learning with High-dimensional Data

Key Dates

July 2016
November 2021

Study Registration Information


Has Results
Award Type
Health Conditions Health Conditions These are the broad terms we use to categorize our funded research studies; specific diseases or conditions are included within the appropriate larger category. Note: not all of our funded projects focus on a single disease or condition; some touch on multiple diseases or conditions, research methods, or broader health system interventions. Such projects won’t be listed by a primary disease/condition and so won’t appear if you use this filter tool to find them. View Glossary
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: March 14, 2024