Project Summary

Background and Significance

Understanding household and family memberships has always been important in distinguishing the influence of shared genetics and environments on clinical risks and patient outcomes. Electronic health records (EHR) often include vast information about individuals that, if linked with that of their household or family members, could provide rich and detailed environmental and family history information to support scientific discovery, and patient-centered outcomes research. Currently, such linkage does not exist in the EHRs. Clinicians documenting a patient’s reported family health history are limited by (1) time, (2) poorly designed EHR systems that do not support accurate and efficient documentation, and (3) patients with limited and often inaccurate medical information about even their first-degree relatives. The ability to make these linkages helps researchers to study whether the disease might be due to a shared exposure (like secondhand tobacco smoke) or shared genetics (like a genetic cause of heart disease), or both.


The objective of this project is to develop and evaluate novel methods to identify household and family memberships using EHR and administrative data, thereby allowing richer and more accurate clinical and environmental exposure information about households and families to be used for scientific discovery, epidemiology, and public health. These methods will support both patient- and family-centered outcomes research (PCOR). To ensure these methods are adopted and used in an ethical manner, the research team will engage patients and other stakeholders in the design of an ethics framework and guidance materials for responsible conduct of research using family and household linkage. The goal is to use technologies to focus on the health of families and to do so in a way that respects individual privacy and autonomy.

In Aim 1, preliminary household and familial linkage methods will be developed using computer science techniques to determine individuals who belong to a household or a family and determine their family relationships. In Aim 2, ground truth data on family members and family relationships will be generated. This information will then be used to improve preliminary methods accuracy. In Aim 3, family linkage data will be used to design a proof-of-concept, person-centered, comparative clinical effectiveness trial to improve outcomes related to family violence such as future family violence (abuse/neglect of child(ren), severe injury including death), adult interpersonal violence, and foster home placement. In Aim 4, the study team will partner with patients, ethicists, and people responsible for ensuring that all research that involves humans or human data is conducted in ways that maximize benefits, minimize risks (including loss of privacy), and informs participants of all potential benefits and risks. With these partners, materials will be created that will help to ensure that researchers use family linkage methods in an ethical way.

Study Design

To develop and test the correctness of family record linkage methods, patient data will be linked from the UCHealth system, Children’s Hospital of Colorado, which are adult and pediatric hospitals located on the same medical campus, and Colorado All Payer Claims data with Census Bureau data. The Rocky Mountain Research Data Center (RMRDC) will be used to access 2020 Census microdata and other more recent datasets, such as the American Community Survey and the Current Population Survey, to construct family linkages from household data that will serve as the ground truth. All research data activities will be hosted and conducted in the RMRDC’s secure computing environment.


In addition to the familial linkage methods, other products include guidelines for the technical implementation of the methods and an ethical framework and guidance for those charged with human subject protection. These outcomes will indirectly benefit patients by 1) improving the quality of health data available for research and 2) providing transparency in linking family data for research. Patients will be engaged to introduce the team’s methods and to discuss possible direct benefits versus risks.

*Methods to Support Innovative Research on AI and Large Language Models Supplement
This study received supplemental funding to build on existing PCORI-funded comparative clinical effectiveness research (CER) methods studies to improve understanding of emerging innovations in large language models (LLMs).

Project Information

Toan Ong, PhD, and Lisa Schilling, MD, MSPH
University of Colorado Denver

Key Dates

36 months
November 2022


Award Type
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: March 15, 2024