Project Summary

This research project is in progress. PCORI will post the research findings on this page within 90 days after the results are final.

One of PCORI’s goals is to improve the methods that researchers use for patient-centered outcomes research. PCORI funds methods projects like this one to better understand and advance the use of research methods that improve the strength and quality of comparative effectiveness research.

What is the project about?

Electronic health records, or EHRs, contain data that researchers can use when studying treatments to find what works best for patients. But some of the most detailed information in these EHRs is in clinicians’ notes. Because clinicians don’t write notes in a standard way, researchers can’t easily access this information using current methods.

In this study, the research team is creating methods for obtaining data from EHR notes for use in research. The new methods use natural language processing, or NLP. In NLP, computer programs interpret written language and make it easier to sort and study.

How can this project help improve research methods?

Researchers can use the results to obtain data more efficiently and accurately from clinicians’ EHR notes.

What is the research team doing?

The research team is creating new methods to allow better use of the information in clinicians’ notes. The methods describe, label, and extract information from clinicians’ notes more efficiently and accurately. For example, the methods use section headers and local context from clinicians’ notes to understand the relationships between key phrases.

After creating the new methods, the research team is testing them to see how well they work in getting information from the notes that doctors record when screening patients with brain injury.

Research methods at a glance

Design Elements Description
  1. Develop and evaluate new assisted annotation methods to minimize clinical note annotation burdens with novel active learning algorithms that leverage feature expansion and semi-supervised learning
  2. Develop new methods for efficient use of existing text annotations and use of non-annotated text data
  3. Develop and assess new generalizable methods for analysis of the local context of information extracted from text
Approach NLP

Conflict of Interest Disclosures

Project Information

Paul M. Heider, PhD
Medical University of South Carolina
Improving Natural Language Processing Methods for Unstructured Clinical Data Reuse

Key Dates

August 2019
December 2025

Study Registration Information


Has Results
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: February 8, 2024