Results Summary

What was the project about?

The way doctors communicate with patients during office visits can affect the quality of care. Studying conversations between doctors and patients can help doctors improve their communication skills.

To study conversations, researchers rely on written records, or transcripts, of office visits. They read the transcripts and give each conversation topic a label. For example, topics may include smoking or pain. But labeling topics in this way may take a lot of time.

In this project, the research team created and tested a new method to make this work easier using natural language processing, or NLP. With NLP, computer programs interpret written language. NLP methods use a process called machine learning, where computer programs use data to learn how to perform different tasks with little or no human input.

What did the research team do?

First, the research team trained the NLP computer program to label topics in transcripts. The team used 279 transcripts of patient-doctor conversations from two previous studies. A researcher had already assigned labels to topics in these transcripts. Using these transcripts, the NLP computer program learned to associate specific words with specific labels. For example, the word cigarette would be labeled as smoking. The program could then label topics in other transcripts.

Next, the research team used the NLP computer program to create three types of NLP methods. Each method used a different statistical approach to label topics. The methods were called non-sequential, window-based, and sequential.

Then the research team tested how well the NLP methods worked to label topics in new transcripts. The team compared topic labels from the new NLP methods with topic labels assigned by a person. They also compared the new methods with a basic NLP method that only labeled the most common topics.

Patients provided input during the project, including how to explain complex research methods in a clear way.

What were the results?

The three new NLP methods labeled topics more accurately than the basic NLP method. The sequential NLP method labeled topics more like a person than the non-sequential NLP method.

When topics were similar, like weight and diet, the methods were less accurate.

What were the limits of the project?

How well the new NLP methods work depends on the data used to train the NLP computer program. The methods may be less accurate for topics that patients and doctors rarely talk about.

Future studies could test if adding other data to the training transcripts, like doctor’s notes, improves the methods’ accuracy.

How can people use the results?

Researchers can use the new NLP methods when studying conversations between patients and doctors to help doctors improve their communication skills.

Final Research Report

View this project's final research report.

Peer-Review Summary

Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.

The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments. 

Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:

  • The reviewers asked for clarity regarding the goals of the project. They noted that although the report alludes to how providers use their time in a clinic visit and to understanding the contents of a clinic visit, these concepts are not addressed further in the report. The researchers explained that they provided these examples of time use and content in clinical visits as examples of how the methods they developed in this study could be used to address those examples. The researchers revised their conclusions to more directly describe how the study results can inform providers about how to use their time and gain a greater understanding of visit content.
  • Reviewers asked why, if a goal of the study was to evaluate emotions during the clinic visit, that coders used audio recordings of the visits rather than transcripts. The researchers explained that they did not have access to the audio recordings, and they confirmed that transcripts provided important emotional information as well.
  • The reviewers appreciated the extent of community involvement in this study and the fact that patients were able to connect the future use of natural language processing as a feedback mechanism that empowers patients to advocate for their own health care. The reviewers suggested that the researchers consider using patients to review the visit transcripts and evaluate how well these methods captured the content and emotional valence of the doctor-patient interaction. The researchers noted that they did involve patients in informal meetings where content of the clinic visit was discussed but had not made this patient review a formal part of the process. The researchers agreed that this would be an interesting and useful addition to their study methods for the future.
  • The reviewers asked the researchers to provide some estimations of what level of accuracy would indicate that the natural language processing methods are good enough to assess the content and emotional valence of doctor-patient interactions. The researchers agreed that this might be helpful but stated that a measure of accuracy would not necessarily provide a true measure of whether a method was good enough. They gave an example from the study results, where their method had no more than 62 percent accuracy for topic classification, but the method matched human predictions quite well, including in making similar errors.

Conflict of Interest Disclosures

Project Information

Zac E. Imel, PhD
Ming Tai-Seale, PhD, MPH
University of Utah
Development of Computational Methods for Evaluating Doctor-Patient Communication

Key Dates

December 2016
July 2021

Study Registration Information


Has Results
Award Type
State State The state where the project originates, or where the primary institution or organization is located. View Glossary
Last updated: October 18, 2023