Project Summary
Real world evidence is essential to answering clinical questions in comparative effectiveness research (CER) and patient-centered outcomes research (PCOR). In many circumstances, randomized controlled trials (RCTs) are not practical or ethical, and their stringent inclusion/exclusion criteria limit generalizability to vulnerable populations such as frail, elderly people, disadvantaged racial groups and those at risk for severe morbidity and mortality.
Drawing causal inference from large-scale data collected from real-world clinical settings is therefore critical to forming important policy related to interventions with patient-centered outcomes. There is a substantial body of causal inference methods with a time-fixed treatment, i.e., treatment assigned at baseline of a cross-sectional study. In comparison, causal inference methods, particularly flexible ones using machine learning, for time-varying treatment, are relatively sparse, due to additional complexities associated with time-varying confounding, selection bias, and longitudinal data structures. Furthermore, existing approaches in this area no longer meet the growing challenges posed by complex health data structures and treatment patterns.
Two emerging and important clinical research questions motivate our project. First, the causal effects of multiple COVID treatment strategies (e.g., dexamethasone, remdesivir, etc.) on the long-term health consequences (e.g., mental health, cognition, kidney function, etc. at least 9 months post diagnosis) of COVID-19 infection, especially in marginalized populations who are under-represented in clinical trials, are important, but unknown. Second, the optimal timing and associated causal effects of an antihypertensive treatment, hydralazine, on the full-data distribution of the systolic blood pressure (SBP) among hospitalized patients who develop severe hypertension are important to provide evidence to inform treatment guideline recommendations, but remain unexplored.
Currently there are no guideline-based recommendations for the treatment of severe inpatient hypertension, which is common in the inpatient setting and is costly. These investigations are urgently needed, but present five main methodological issues that prohibit direct applications of existing parametric longitudinal causal inference approaches:
1) treatment is time-varying in relation to study entry (e.g., onset of severe inpatient hypertension or hospital admission upon diagnosis of COVID-19),
2) outcome of interest is beyond population means (e.g., percentiles or censored survival outcomes),
3) there can exist more than one time-varying treatments with multiple switches,
4) a potential source of bias from model misspecification, and
5) a potential source of bias from longitudinal, unmeasured confounding.
We propose new methods to fill in these critical methodological gaps in longitudinal causal inference research and to address the important PCOR questions. We will develop a new, robust marginal structural quantile model to draw simultaneous causal inference about longitudinal treatments across the entire distribution of outcomes and further improve the flexibility of the model by using machine learning (Aim 1). For censored survival outcomes, we will first develop a new, joint marginal structural model, in continuous-time, for the restricted mean survival times, which, compared to the hazard ratio, is a more robust and interpretable estimate that measures the average event-free survival time in any given period.
We will then develop a Bayesian likelihood-based machine learning method that can accommodate time-varying covariates to estimate a set of weights for correcting time-varying confounding or selection bias due to informative censoring (Aim 2). To tackle the “no unmeasured longitudinal confounding” assumption, we will further develop a flexible and interpretable sensitivity analysis framework. Machine learning will be used to estimate the causal effects adjusted for the posited amount of unmeasured confounding over time. The new approach will be applied to assess the sensitivity of drawn causal conclusions in the first two aims (Aim 3).
Finally, we will apply the methods developed in Aim 1 and Aim 2 to address two emerging and important CER questions that have not yet been solved. Building on our extensive experience in producing software packages to assist the implementation of new statistical methods, we will also develop open-source software within the R computing platform that will implement all proposed methods (Aim 4).