Results Summary
What was the research about?
Sometimes a study can get better results using data from different sites. In these cases, researchers may want to share patient data, including personal and private information such as dates of birth and addresses. However, researchers may not want to share data across sites because of worries about patient privacy. Some statistical methods can change patients’ sensitive individual data into summary data that hides individuals’ personal information. These privacy-protecting methods, or PPMs, make it safe to share data across sites. But researchers don’t know if PPMs produce accurate results.
In this study, the research team compared combinations of PPMs with methods that use patients’ individual data.
What were the results?
PPMs provided similar results to methods that used patients’ individual data. Some PPMs worked better than others.
What did the research team do?
The research team used a computer program to create data that mimicked real patient data. The team used three ways to change the individual data into summary data. Then they analyzed the summary data using a combination of PPMs. The team compared the results they got using summary data with results from individual data. Finally, the team tested the PPMs using real-world data.
Patients, researchers, and hospital staff provided input during this study.
What were the limits of the study?
The research team studied only some PPMs and types of data. Results may differ with other PPMs or data.
Future research could test PPMs in other research studies with data from different sources.
How can people use the results?
Researchers can use the results when considering how to share data across research sites.
Professional Abstract
Objective
To assess the performance of privacy-preserving analytic methods in patient-centered outcomes research
Study Design
Design Element | Description |
---|---|
Design | Simulation studies, empirical analysis |
Data Sources and Data Sets | Simulated data to resemble comparative effectiveness research data, empirical data from the administrative and clinical databases of Kaiser Permanente & Strategic Partners Patient Outcomes Research to Advance Learning (PORTAL) network |
Analytic Approach | Combination of 3 data sharing methods (risk-set, summary-table, effect-estimate), 2 confounding risk scores (propensity score and disease risk score), and multiple confounding adjustment approaches (matching, stratification, weighting) |
Outcomes | Treatment effect estimates for binary and survival outcomes |
Privacy-preserving analytic methods offer new ways to address patient privacy, data security, data control, and regulatory requirements when collaborating across multiple research sites. These methods analyze summary-level patient data from research sites rather than individual-level patient data and may increase researchers’ willingness and ability to collaborate in multisite research studies. Summary-level data include the same patient information as individual-level data for statistical analysis, but the data are transformed and summarized to exclude sensitive information and preserve patient privacy.
In this study, the research team developed a set of privacy-preserving analytic methods made up of data sharing methods, confounding scores, and confounding adjustment approaches. The team compared the performance of privacy-preserving methods that use summary-level data with those using individual-level data.
The research team conducted multiple simulations to generate individual-level data that resembled patient data from an observational study. Next, the team transformed the individual-level simulated data to create three different summary-level data sets representing increasing levels of summarization: risk-set data, summary-table data, and effect-estimate data.
The research team then analyzed each data set using different privacy-preserving methods to estimate treatment effects on health outcomes. To control for confounding in the analyses, the team applied confounder summary scores such as propensity scores (PSs) and disease risk scores (DRSs). The team incorporated these scores into the analyses for confounding adjustment via matching, stratification, or weighting. To evaluate the performance of the various privacy-preserving methods on survival outcomes, the team compared the treatment effect estimates with the true treatment effect used to generate the simulated data.
The research team also tested the performance of the privacy-preserving methods using real data from two comparative effectiveness studies on obesity and rheumatoid arthritis from three sites within a clinical data research network. The team applied the same analytic methods used with the simulated data.
To design and implement this study, the research team worked with patients, health system administrators, other researchers, and experts in governance and regulatory compliance.
Results
Both simulation and empirical studies showed that the privacy-preserving methods produced results that were identical or highly comparable to results obtained from individual-level data analysis in most of the scenarios examined.
Simulation studies
- Matching and weighting, when used to adjust for confounding, produced less biased results than stratification in all three summary-level data analyses.
- In simulation scenarios with outcomes that are uncommon, PS-based methods performed slightly better than DRS-based methods in validity (when compared with the true treatment effect) and precision of point estimates.
- All privacy-preserving methods produced similar results to those from the individual-level data analyses with slight variation in PS-based analysis of effect estimates in settings with infrequent exposure.
Empirical analysis
- The confounding adjustment approach affected the comparability between the summary-level data analyses and the individual-level data analyses.
- Risk-set data sharing generally performed better in estimating the treatment effect, while summary-table and effect-estimate data sharing more often produced variation in settings with outcomes that are uncommon and small sample sizes.
Limitations
The research team did not examine other privacy-preserving methods, such as the distributed regression approach. Lack of homogeneity in the data can affect the comparability of the summary data across multiple sites and could affect the decision to combine site-level data.
Conclusions and Relevance
Privacy-preserving methods using summary-level data do not affect the bias and precision of treatment effect estimates compared with results of pooled individual-level data analyses. These methods may be an alternative to sharing individual-level data when privacy concerns arise in multisite research studies.
Future Research Needs
Future research could assess the performance of privacy-preserving methods in studies with more sites and partitioned data on the same patient from different databases.
Final Research Report
View this project's final research report.
Journal Citations
Related Journal Citations
Stories and Videos
Videos
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented, and the researchers made changes or provided responses. The comments and responses included the following:
- Reviewers raised concerns about how representative the viewpoints of the stakeholders selected for interviews could be expected to be. In particular, reviewers noted that the policies and procedures of the institutional stakeholders’ organization, Kaiser Permanente, could influence the institutional stakeholders’ views. The researchers stated that they did not believe organizational policies or practices would impact more scientific discussions. However, they addressed the reviewers’ concerns by discussing the selective nature of their stakeholders and the potential impact on their Aim 1 findings in their limitations section.
- Reviewers suggested adding a brief summary of the strengths and limitations of each methodological approach described to stakeholders. The researchers added a table to the discussion section summarizing the strengths and limitations of the analytic approaches considered in this study. The table also provides ranks of the various methods under different scenarios, so that it can guide stakeholders who might be choosing among these methods for their own analyses.
- Reviewers asked the researchers to provide more of a summary of qualitative findings from their interviews with stakeholders and patients. The researchers explained that they would have liked to be able to summarize findings, but there was considerable variability between and within different stakeholder groups regarding priorities and preferences for data sharing. The researchers felt that this variability was an interesting finding in itself.
- Reviewers suggested adding a glossary of key terms, primarily noting the interchangeable use of the terms, privacy and confidentiality. The researchers instead aimed to improve clarity by largely omitting the term, confidentiality, assuming other terms were less confusing.