Designing and Testing a Search Engine for Clinical Notes Included in Electronic Health Records

Results Summary
Professional Abstract

Results Summary

Download Summary

What was the project about?

Electronic health records, or EHRs, have information about a patient’s health such as test results, diagnoses, and treatments. EHRs also have clinical notes that doctors and patients can use to track goals and decisions.

Clinical notes may be useful for research or to help improve care. But it’s hard to get information from these notes across large groups of patients. The notes may use different ways to describe the same thing. For example, high blood pressure may be called hypertension. Also, the notes may use abbreviations or have spelling mistakes.

In this project, the research team designed and built a search engine to make EHR notes easier to search and use for patient care and research.

What did the research team do?

To develop a new method for searching clinical notes in EHRs, the research team used 66 million clinical notes from patient visits at Nationwide Children’s Hospital in Ohio from 2006 to 2016. Using the new method, the team built a search engine called QREK. QREK stands for Query Refinement by word Embedding and Knowledge base. QREK finds and pulls out EHR notes that are related to keywords entered into it. It can also suggest other relevant keywords and common alternatives.

The research team tested QREK in two ways. First, they asked three doctors to rate the relevance of terms suggested by QREK across 11 searches. Second, the team looked at how often QREK correctly suggested a synonym for a known medical term.

The research team tested the final version of QREK under nine different scenarios with people at Nationwide Children’s Hospital to get feedback about its usefulness. For example, some people used QREK to do research; others used QREK to help improve care.

Patients, hospital administrators, health insurers, health information technology specialists, researchers, and clinicians provided input during the study.

What were the results?

The research team found that about 72 percent of the terms suggested by QREK were relevant to the original search term. Also, of the first 60 terms suggested by QREK, 54 percent matched synonyms on a standard list of known medical terms.

In the nine scenarios tested, people reported that QREK improved their use of EHR notes.

What were the limits of the project?

Testing occurred at a children’s hospital using keywords for children’s care.

Future research could continue to refine QREK and test QREK with EHR notes from adult care settings.

How can people use the results?

Researchers and hospital staff can use QREK to search and use notes in EHRs. QREK is available free of charge.

How this project fits under PCORI’s Research Priorities
The research reported in this results summary was conducted using PCORnet^®, the National Patient-Centered Clinical Research Network. PCORnet^® is intended to improve the nation’s capacity to conduct health research, particularly comparative effectiveness research (CER), efficiently by creating a large, highly representative network for conducting clinical outcomes research. PCORnet^® has been developed with funding from the Patient-Centered Outcomes Research Institute® (PCORI^®).

Professional Abstract

Background

Researchers can use electronic health records (EHRs) as a source of data for comparative effectiveness research. Clinical notes in EHRs contain details about clinical decision making and patient perspectives. But these data often go unused due to a lack of robust search methods.

A text search engine can extract information from unstructured textual notes and make it usable for research. However, existing methods used by text search engines do not account for misspellings or the use of plain language for medical terms. Developing user-friendly, efficient, and precise search methods can help extract text from clinical notes for use in improving research and clinical care.

Objective

(1) To design and build a search engine to extract relevant clinical text from EHRs efficiently; (2) To assess the performance of the new search engine

Study Design

Design Element	Description
Design	Search method development, software development
Data Sources and Data Sets	More than 66 million clinical notes documenting patient encounters at Nationwide Children’s Hospital from 2006 to 2016
Analytic Approach	Machine learning method called unsupervised neural network; performance assessment; user interface testing; use cases
Outcomes	Precision, recall, usability

Methods

The research team first established a new methodological framework to efficiently search for relevant clinical text from EHRs using an original query term. To create the framework, the team used more than 66 million clinical notes documenting patient encounters at Nationwide Children’s Hospital in Ohio from 2006 to 2016. The framework included possible refinements for common queries and categorized relationships between original query terms and query refinements.

Next, the research team developed a web-based interactive search engine called Query Refinement by word Embedding and Knowledge base (QREK). Given a user’s input query, QREK generates a list of relevant keywords, including word variations such as formal or informal forms, synonyms, abbreviations, and misspellings, and other relevant words like related diagnoses, medications, and procedures.

The research team then assessed the performance of QREK in two ways. First, the team asked three hospital residents to conduct 11 predefined queries and assess the relevance of terms suggested by QREK. Second, the team assessed QREK’s ability to recall known synonyms from 6,682 terms in the Systematized Nomenclature of Medicine (SNOMED). The team calculated the percentage of SNOMED synonyms that QREK suggested among the first 60 search results.

The research team tested and refined the QREK user interface with six clinical residents. They then implemented the final version of QREK in nine use cases at Nationwide Children’s Hospital.

Patients, hospital administrators, health insurers, health information technology specialists, researchers, and clinicians provided input during the study.

Results

On average, among the 11 predefined queries, precision was 0.72 (i.e., 72% of QREK results were found to be relevant). Among the 6,682 synonym searches, recall was 0.54 (i.e., 54% of QREK’s first 60 search results matched a synonym in the SNOMED list).

In the use cases, clinicians, researchers, patients, and caregivers reported increased utility of EHRs with QREK.

Limitations

Testing occurred at a pediatric hospital using only pediatric keywords; performance with adult care keywords is unknown.

Conclusions and Relevance

QREK can help users extract textual clinical notes, with the goal of improving use of these notes in research and clinical care.

Future Research Needs

Future research could continue to refine QREK and test it in adult care settings.

Final Research Report

View this project's final research report.

Journal Citations

Related Journal Citations

Peer-Review Summary

Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.

The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.

Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:

The reviewers commended the researchers on an interesting study. They found many sections easy to understand but other sections were overly technical. The researchers addressed this issue by adding plain-language summaries at the end of technical sections of the report and added a glossary to explain unfamiliar terms.
The reviewers suggested that the source code for the language processing tool the researchers created should be made available publicly so the tool could be applied in other locations. The researchers added information to the report about how readers could request the source code, which would be made available without cost.
The reviewers noted that user feedback included in the report appeared to be anecdotal rather than systematic and asked the researchers to demonstrate their systematic approach to collecting feedback from usability testing. The researchers responded that they completed systematic usability testing through other funding and reported on that in a published article. Therefore, they obtained only informal feedback from individual research teams for the current report.

Conflict of Interest Disclosures

View the COI Disclosure Form

Project Information

Principal Investigator Principal Investigator The lead researcher and primary contact for the project. View Glossary:

Yungui Huang, PhD, MBA

Other Principal Investigator Other Principal Investigator An additional lead investigator. This investigator may be affiliated with a different institution or organization from the Principal Investigator. View Glossary:

Huan Sun, PhD

Organization Organization The institution/organization in which the project originates, or the primary institution or organization that received funding for the project. View Glossary:

Nationwide Children's Hospital/The Ohio State University

Project Budget:

$1,047,525

DOI - Digital Object Identifier:

10.25302/11.2022.ME.2017C16413

Project Title Project Title The original title of the project supplied by the principal investigator or project lead/team. View Glossary:

Unlocking Clinical Text in EMR by Query Refinement Using Both Knowledge Bases and Word Embedding

Key Dates

Approval Date Approval Date The date of approval to fund by PCORI. The actual project start dates vary as the negotiation of project milestones must be completed before the contract can be fully executed. View Glossary:

November 2017

Project End Date Project End Date Includes the research project period and may be subject to modification to allow other research-related activities such as peer review. View Glossary:

December 2022

Year Awarded Year Awarded The year that funding for the project was approved, or the year the proposal received a notice of award. View Glossary:

2017

Year Completed:

2022

Study Registration Information

HSR Project Number:

HSRP20184397

About

Research

Impact

Highlights of PCORI-Funded Research Results

Topics

Engagement

Funding Opportunities

Applicant and Awardee Resources

Events

Jump to Section