People interact with the environment at different levels in a large social system. An individual’s health outcomes are determined through a complex interplay of multilevel factors including both social determinants of health (e.g., education, employment, and social cohesion) and behavioral determinants of health (e.g., smoking). For example, cancer, the second leading cause of death in the United States, presents multiple causation and outcomes related to its biological, clinical, behavioral, and social influences. Nonetheless, these important variables are scarcely documented in structured medical codes but are often available in narrative clinical text. Clinical Natural Language Processing (NLP) is the key technology to extract information from unstructured clinical text to support downstream applications that depend on structured data. However, NLP methods to extract social determinants of health have been understudied. Existing NLP systems for behavioral determinants and adverse events are suboptimal.
Current clinical outcomes studies in PCORI communities are often limited to only structured medical codes due to a lack of NLP systems to identify and extract the necessary information and populate it into the Common Data Model (CDM) of PCORnet, the National Patient-Centered Clinical Research Network. This proposal seeks to develop NLP methods and systems to extract and connect social/behavioral information and adverse events with clinical factors (medical concepts that are directly generated by clinical practice, e.g., diseases, medications) for clinical outcomes research. The proposed NLP system will unlock mentions of social determinants, behavioral determinants, and adverse events from narrative clinical notes and populate them into structured PCORnet CDM databases. The NLP methods proposed in this project will also advance the extraction of general medical concepts from clinical narratives.
This project will leverage the informatics infrastructure and clinical data at two PCORnet Clinical Research Network (CRN) sites—the University of Florida, affiliated with OneFlorida Clinical Research Consortium (OneFlorida CRC), and Weill Cornell Medicine, affiliated with the Insight-NYC, the New York City-based CRN. If successful, this project will provide an easy-to-use package to bridge the gap of using clinical narratives for PCORI and other communities. To develop a successful NLP tool for extracting social determinants, behavioral determinants, and adverse events from clinical narratives, the involvement of clinicians, patients, researchers, and data managers is very important. The clinician and patient representatives will provide suggestions on how the information was mentioned and documented in electronic health record systems during patient-provider communications. This information can help us determine where different social and behavioral variables are documented to guide the development of methods and systems.
The researchers will provide suggestions on identifying and categorizing the SDoH, BDoH, AEs and other clinical factors that are priorities for their own studies. The representatives of data managers and analysts will provide feedback on pipelines to populate information to structured databases (e.g., PCORnet CDM) and how to use NLP-extracted information to form queries that were not available before. We will form an advisory panel of all stakeholders and evaluate the system using cancer patients as cancer outcomes are known to relate with various social and behavioral influences and adverse events.