Project Summary

Background: Electronic health records (EHRs) have tremendous potential to provide information on the care of patients outside controlled research environments. Concern about confounding stemming from a lack of baseline randomization and poorly measured comorbidities remains the greatest obstacle to utilizing these data sources for evidence generation. Two major barriers to improving confounding control for valid estimation of treatment effects across EHR databases include: 1) a lack of uniform coding practices and content across EHR systems, and 2) a lack of generalized analytics for confounding control that maximize use of information across diverse EHR data sources. EHR data are and will remain diverse. The development of data-adaptive analytic algorithms that can control for confounding across diverse EHR systems to estimate treatment effects is critically important for ensuring the validity of multi-site studies. High-dimensional proxy adjustment (HDPA) is a set of methods that substantially expand the amount of information used for confounding adjustment in healthcare databases. HDPA methods have been shown to improve confounding control across a diverse range of studies and have become some of the most widely used tools for confounding adjustment in healthcare databases. Although HDPA methods can leverage the full information content in EHR data to improve confounding control, they require settings that can be sensitive to properties of the given data, including variations in coding practices across EHR sites.

Research Gap: When findings from HDPA analyses differ across EHR sites, it is difficult to determine if this is due to differences in populations and coding practices across sites (heterogeneity) or differences in the performance of the analytics across sites (transportability or validity). Evaluating the transportability of data-adaptive analytics for estimating treatment effects across EHRs is critical to determine which algorithmic settings are likely to produce the most valid evidence when estimating the effectiveness and safety of medical treatments.

Objectives: This proposal seeks to extend HDPA by developing and testing a framework for evaluating the transportability of automated machine learning (ML)-enabled methods for confounding control in causal effectiveness studies across multiple EHR systems.

Aims: This project will integrate recently developed and highly innovative ML validation methods for causal inference (synthetic controls) with recent ML advancements for HDPA. Synthetic control validation methods assess the validity of an applied method for causal inference to tailor analyses to the given study. The use of synthetic controls has been shown to improve validity in studies with few investigator-specified variables but can fail in EHR database studies involving large numbers of variables. The research team has developed a novel framework for generating synthetic controls to address this gap. The team will refine and validate the developed algorithm using 8 empirical studies across 3 distinct EHR systems. Empirical studies will involve a variety of therapeutic areas including: 1) effect of anti-diabetic medications on cardiovascular events, 2) effect of anticoagulants on stroke/bleeding risk, and 3) effect of antibiotics on sepsis requiring hospitalization. The EHR sites will include: 1) Partners Healthcare System in Massachusetts; 1) US National Veterans Health Administration; and 2) Wake Forest Baptist Health in North Carolina. Performance standards based on results from randomized control trials (RCTs) will be developed as benchmarks by restricting study populations to closely approximate corresponding RCT populations. Advantages and limitations of the developed framework for assessing transportability of data-adaptive HDPA analyses will be determined across different EHR sites. Stakeholders will provide input to refine the proposal. The team will engage 1) patients/patient advocates for insights on how to communicate study findings; 2) EHR users for documentation conventions; and 3) research methodologists for interface and format of dissemination materials.

Relevance to patient-centered research: Valid analytics are the foundation for patient-centered outcomes research (PCOR). As such, valid analysis of EHR data is and will remain a core aspect of PCORI’s work. Much of the patient-reported information is primarily recorded in EHR and currently underutilized. An important aspect of patient-centered research is studying outcomes that matter to patients using relevant comparison therapies. This requires flexible analytic approaches that can easily incorporate a wide range of treatment-outcome scenarios.

Project Information

Richard Wyss, PhD, MS
Brigham and Women's Hospital

Key Dates

36 months
November 2022


Award Type
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: September 26, 2023