Project Summary

Background and significance

Determining “what works for whom” is a key goal in prevention and treatment across a variety of areas, including for many questions of comparative effectiveness. By understanding which individuals benefit most from which treatments, clinicians have the possibility of directing scarce resources to those who will most benefit, and of helping individuals find the treatment that works best for them. Identifying effect moderators—factors that relate to the size and direction of treatment effects—is crucial for delivery of healthcare interventions (including medications, surgeries, behavioral programs, and other treatments), but doing so is incredibly difficult using standard study designs. Randomized trials are usually the gold standard for estimating average effects as they are designed to control factors that could interfere or disturb treatment effects (called confounding), but they are typically underpowered to detect moderation. Large-scale nonexperimental studies may provide another source to examine effect moderation, but can suffer from confounding. In addition, care needs to be taken to ensure that the results are relevant for the target populations of interest—such as the patients in a particular health system. New methods are needed to best integrate and harness the data available to learn how to personalize treatments.


This work will synthesize, extend, and apply methods for identifying effect moderators when multiple studies are available. The methods will apply broadly and will be illustrated in an example estimating the effects of medication treatment for major depressive disorder (in particular, duloxetine versus vortioxetine), using data from four randomized controlled trials and nonexperimental data from the Duke University Health System electronic health records. The work will: (a) develop statistical methods to identify effect heterogeneity using data from combined datasets with both experimental and non-experimental studies; (b) develop diagnostics and methods that allow focus on the identification of effect heterogeneity in particular target patient populations; and (c) develop guidance for the appropriate use of the methods developed.

Study design

The work will proceed through statistical methods development, theoretical derivation, simulation studies based on real data, and data analyses. The methods will be broadly applicable, and illustrated using a study comparing two medications for major depressive disorder: duloxetine and vortioxetine, and looking at a range of outcomes, including depression symptom measures, remission, dropout, and psychiatric emergency department visits and hospitalizations. Possible effect moderators include co-occurring disorders, demographics, information on substance use, and indicators of severity of disease. The target population of interest will be a broad spectrum of individuals with major depressive disorder in the Duke University Health System.

The team will study and develop innovative statistical methods using two flexible approaches: machine learning and Bayesian analysis. A methodological focus will be on articulating the assumptions underlying the approaches, and understanding when they are appropriate to use. First, the work will adapt existing machine learning and Bayesian methods to use multiple data sources to examine effect heterogeneity. The machine learning methods will allow for flexible combinations of covariates to be identified as moderators. Second, the work will adapt existing Bayesian individual participant data meta-analysis methods to borrow information adaptively across data sources (experimental versus nonexperimental studies) based on the quality of evidence and level of compatibility. An additional advance will be to use methods that help ensure the randomized trial sample data is relevant for the patient populations of interest, and to develop diagnostics for the similarity between data sources. In addition, the team will handle discordant outcome measures (e.g., different mental health scores) across different data sources. This will allow, for example, the use of patient-centered outcomes even when they are not available in all datasets. Researcher and clinician stakeholders will be engaged throughout the study to ensure the methods, communication of results, and identification of moderators of treatments for depression are transparent and clear to clinicians.

Anticipated Impact

By developing advanced computational methods to take full advantage of both experimental and nonexperimental data, this work has the potential to help identify “what works for whom.” This project will use medication treatment for depression as a motivating example, addressing a question highly relevant for patient decision making. The methods, however, will have broad applicability across clinical areas.

Project Information

Elizabeth Stuart, PhD
Johns Hopkins Bloomberg School of Public Health

Key Dates

July 2021
November 2025


Award Type
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: March 14, 2022