Results Summary
What was the project about?
Comparative effectiveness research compares two or more treatments to see which one works best for which patients. But patient traits, such as age or income, may affect patients’ treatment choices. These traits may also affect patients’ responses to treatments. As a result, researchers may have trouble telling whether a patient’s traits, the treatment, or a mix of the two affected how well a treatment worked.
Statistical methods called matching methods can help address this problem when researchers use patient data to compare the effects of treatments. Matching methods help researchers find data from patients who had similar traits such as age or race and received different treatments. Because the patients are similar except for the treatment they receive, the differences in patients’ health can more likely be credited to the treatment. Existing methods work well for comparing up to two treatments. But they may not work with three or more treatments.
In this study, the research team created two new matching methods to compare the effects of three or more treatments. The team then analyzed the new methods under different conditions to see how well each worked.
What did the research team do?
The research team used a computer program to create test data that looked like real patient data. The test data had information on patient traits and treatments. The team then developed two matching methods.
The research team first compared the new methods with existing methods. They looked at which methods worked better to match similar patients more accurately. Then the team tested each new method under different conditions, such as comparing different numbers of treatments.
What were the results?
The new methods matched patients more accurately than the existing methods.
The two new matching methods worked well under different conditions. For example, one method matched patients well when comparing three treatments, but the other method performed better when comparing more than five treatments.
What were the limits of the project?
This study compared different matching methods on test data created using a computer program. Results may differ when using real patient data. Also, the methods may not be valid in all situations. For example, the methods may not be valid if patients have differences that can affect their health but are not reflected in the data.
How can people use the results?
Researchers can consider using these statistical methods in studies that compare the effects of three or more treatments.
Professional Abstract
Background
To estimate the causal effects of treatment in observational studies, researchers often use matching methods. These methods aim to identify patients with similar characteristics who received different treatments such that differences in their outcomes can be attributable to the treatment they received. Existing matching methods for comparing three or more treatments, such as series of binary comparisons (SBC), common referent patient matching (CRPM), and previously proposed methods using generalized propensity scores (GPS), have certain limitations. For example, SBC and CRPM rely on methods that compare two treatments at a time and may result in non-comparable estimates of treatment effects because they relate to different populations.
Objective
To develop two new matching methods and compare their performance with that of existing methods
Study Design
Design Element | Description |
---|---|
Design | Simulation studies, empirical analyses |
Data Sources and Data Sets | Simulated data for comparing the treatment effect of three or more treatments; UK Clinical Practice Research Datalink for empirical application |
Analytic Approach |
Existing methods:
New methods:
Imputation based on Approximate Bayesian Bootstrap |
Outcomes | Standardized pairwise bias at each covariate, number of units in the reference group that were retained in the study |
Methods
The research team developed two new methods, basic matching (BM) and vector matching (VM), with variations that included
- Choice of distance measure, such as the Mahalanobis distance, which tells researchers how similar patients are to each other
- With or without caliper, a predefined threshold for distance when matching patients
- With and without replacement of matched observations.
The research team first compared VM with three existing methods, GPS, SBC, and CRPM. They then compared BM and VM by conducting simulations with 3, 5, and 10 treatments. In each simulation, one treatment served as the reference treatment. The team looked at two outcomes for all comparisons:
- The proportion of patients from the eligible population who received the reference treatment and were included in the final matched set
- Standardized pairwise bias, which evaluates the quality of matches on each characteristic
Results
Comparing VM with existing methods. VM performed better than GPS, SBC, and CRPM on both outcomes.
Comparing BM and VM methods. The appropriate matching method depended on the number of treatments.
- Three treatments. VM without replacement produced the lowest standardized pairwise bias, followed by BM using Mahalanobis distance of the GPS and VM with fuzzy clustering. However, VM without replacement had the lowest median proportion of patients matched.
- Five treatments. Overall, BM and VM methods without caliper produced lower standardized pairwise bias, but among methods using a caliper, VM with fuzzy clustering matched the highest proportion of patients.
- Ten treatments. VM with fuzzy clustering produced the least bias.
Limitations
The research team used simulated data sets to explore a variety of scenarios. Results could differ when using different simulation configurations. Analyses depended on the unconfounded assignment assumption that after matching on observed characteristics, no unobserved differences could affect health outcomes. If this assumption is violated, estimates may be biased.
Conclusions and Relevance
BM using Mahalanobis distance of the GPS with or without caliper performs better than other methods for fewer than five treatments. However, as the number of treatments increases, VM matching with fuzzy clustering performs best with the lowest standardized pairwise bias.
Future Research Needs
Future research could examine the sensitivity of study results to the unconfounded assignment assumption.
Final Research Report
View this project's final research report.
Journal Citations
Results of This Project
Related Journal Citations
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers commented that the targeted methods for comparing more than two interventions in an observational study could be biased because the individuals counted in the study appeared to be only those who stayed on the comparison medication for a specific period of time. Thus, although an important outcome was major adverse events, those individuals who experienced those events early in treatment would not be included in the analyses. The researchers explained that they based their analyses on the sample of individuals who were on the targeted treatments 60 days after treatment onset, avoiding inclusion of patients who might have experienced adverse events unrelated to the treatments. In addition, the researchers noted that they applied an intention-to-treat analysis and each patient was followed for three years or until death based on their original group assignment. Any changes in treatment that occurred during that period did not change patients’ treatment group category, but the researchers did consider the amount of time patients were on the index treatments in their analyses.
- The reviewers asked whether the period of time used to assess patient outcomes differed for different patients. The researchers replied that they followed all patients for three years unless the patients died during that period. They accounted for the different periods of follow-up time by using both major cardiovascular events and all-cause mortality as outcomes, rather than just major cardiovascular events.
- The reviewers asked why the researchers used a fixed period of six months as a baseline period to assess the health condition of individuals before they received the experimental intervention. The researchers said they decided on this design after consultations with clinical experts and to be in line with other clinical trials they cited.