To advance statistical methodologies for analyzing data collected during regular health care by (1) comparing test strategies for identifying outcome-dependent visit data and (2) assessing bias in longitudinal statistical estimates arising from clinic data generated at outcome-dependent visits
|Theoretical calculations; simulation studies
|Data Sources and Data Sets
Illustrative visit process data from 50 to 100 patients from each of 4 typical clinical database longitudinal studies with regular and outcome-dependent visits; simulated data generated under outcome-dependent visit processes
Databases: patients undergoing brain surgery for brain aneurysms, patients with chronic kidney disease, patients after bone-marrow transplants, men diagnosed with prostate cancer
- Evaluation of bias introduced when using standard analysis techniques derived by theoretical calculations
- Comparison of test statistics for identifying outcome dependence in data
- 3 standard models of fitting longitudinal data that ignore the outcome-dependent visit process: mixed model regression fit via ML, generalized estimating equations with independent working correlation, and generalized estimating equations with exchangeable working correlation
- 3 newer models designed to adjust for outcome dependence: inverse weighted marginal models, ML-based joint modeling of outcomes and visit times, and 2-stage modeling of visit process and conditional outcomes
Efficacy of the test statistics in identifying outcome-dependent data, bias of the various estimators considered
Clinical research uses information collected during patients’ regular health care to support large-scale comparative effectiveness research (CER) and observational studies. Patients with certain conditions often have, in addition to scheduled regular visits, symptom-based clinic visits in which patient characteristics or symptoms relate to the study outcome. For example, a patient who notices condition-related symptoms may schedule a clinic visit outside the regular schedule that yields data related to the CER outcome being measured; a patient without symptoms on a regular visit schedule will not have comparable data. Also, patients with advanced disease may experience higher likelihoods of visits. Researchers call visits, where frequency or content may be symptom driven, outcome-dependent visits.
Ordinary clinic data analyses are subject to systematic error because clinic visit frequency and timing may be related to the longitudinal outcome being studied. In this study, the research team created simulated data to assess bias introduced from outcome-dependent visit data in estimating treatment effects; the team used three standard statistical models and three newer models specifically designed to minimize this bias.
The research team proposed three tests for identifying data that may be affected by the number and timing of outcome-dependent visits. The test statistics incorporated the following patient-level data characteristics: inter-visit time, observed visit time, and total number of visits.
To develop realistic simulations, the research team interviewed a stakeholder panel of four clinician-scientists who oversee clinical databases. The team wanted to determine how regular clinic visits and outcome-dependent clinic visits are defined and to document reasons for outcome-dependent visits. A wide range of simulated conditions included variations in outcome distributions, cluster and per cluster sample sizes, degree of outcome dependence in visit data, and extent to which statistical assumptions were met or violated. Outcome-dependence simulation variations included visit probability based on an underlying health problem or outcome, subgroups with more frequent visits, and varied visit probabilities related to regularly or randomly scheduled visits.
Identifying outcome-dependent data. When the research team analyzed simulated data with a known level of outcome dependence, two of the three test statistics performed well in identifying outcome-dependent visit processes affecting the data: a test based on the total number of visits for each patient and a test based on the random effects portion of the linear predictor.
Bias from outcome-dependent visits. Compared to standard statistical models, the newer models designed to reduce bias from outcome-dependent visits did not perform better across all conditions. Bias primarily emerged in relation to random effects in the model (i.e., intercept, time), not to fixed effects (i.e., group, group by time). The models based on maximum likelihood (ML) exhibited the least susceptibility to bias. Also, results became less biased if the data set included at least a small number of regular visits, rather than outcome-dependent data only.
The research team used simulation rather than actual clinical data. Other studies could include different drivers of visit frequency or timing of data availability, and these statistical methods might not apply to those situations. These methods did not consider other models, such as time-to-event models, which could produce different results. The models developed for this study may not capture unknown outcome-dependent visit features that could occur when the model omits important random effects.
Conclusions and Relevance
Because longitudinal data from irregular, outcome-dependent clinic visits can introduce bias, researchers should undertake these analyses with care. Also, researchers should interpret covariates modeled as random effects with caution. These findings indicate that standard ML-based methods are less susceptible to bias than methods purporting to adjust for outcome dependence. The proposed test statistics can screen data for potential outcome-dependent visit processes that may bias analytic results. By reducing bias from outcome-dependent visits in conducting CER, clinicians and patients can ultimately make better informed treatment decisions.
Future Research Needs
Future research into the practical application of these models should assess their validity when assumptions underlying the statistical models are not met. Researchers can apply these models to clinical databases of other diseases.