Sharing and reusing clinical data is key to enabling patient-centered outcomes research (PCOR). Data registries established for conducting PCOR must ensure appropriate privacy and confidentiality protections, as pointed out by the PCORI Methodology Committee. Current de-identification or “anonymization” practices have raised increasing concerns due to (1) insufficiency of current de-identification methodologies against re-identification and disclosure risks, (2) lack of transparency and accountability in the use of de-identified data, and (3) lack of proof that the de-identified data are useful for PCOR. All of these challenges may significantly hinder the PCOR studies.
The study team proposes to develop a framework for building patient-centered and privacy-preserving statistical data registries for PCOR. It has three specific aims: (1) develop methods for establishing data registries using private data with differential privacy accounting for PCOR data characteristics, (2) develop methods for establishing data registries utilizing both private and consented data, and (3) develop methods for evaluating and tracking patient privacy risks and establishing data registries taking into account fine-grained patient privacy preferences.
Our approach is patient-centered, data-driven, and research-driven. We address the unique data characteristics in common PCOR studies including high dimensionality and high correlation. In addition, we focus on analyzing and tracking patient privacy risks and developing methods for building data registries that respect fine-grained personalized privacy. We also have an extensive patient engagement plan to ensure the resulting methodology is driven by patient perspectives. Once the methodology is developed, we will construct data registries at Emory and UCSD using data extracted from the clinical data warehouses, and study the inherent tradeoffs among privacy protection and utility for PCOR studies.
The expected outcomes include (1) a suite of novel algorithms and techniques and a first-of-its kind software toolkit for building data registries with rigorous privacy and patient privacy preferences; (2) survey and panel findings from patients, clinicians, patient advocates, and other stakeholders on the desired practices to establish patient-centered privacy-preserving data registries; and (3) evaluation of results using real data extracted from Emory and UCSD clinical data warehouses with insights on the tradeoffs among utility, privacy, and efficiency. The project will have significant and timely impact involving multiple stakeholders. It will provide a new and complementary methodology to current de-identification and policy-based data registry practices. It will empower patients with rigorous and transparent privacy control while contributing derivatives of their data to PCOR.