Results Summary
What was the project about?
The way doctors communicate with patients during office visits can affect the quality of care. Studying conversations between doctors and patients can help doctors improve their communication skills.
To study conversations, researchers rely on written records, or transcripts, of office visits. They read the transcripts and give each conversation topic a label. For example, topics may include smoking or pain. But labeling topics in this way may take a lot of time.
In this project, the research team created and tested a new method to make this work easier using natural language processing, or NLP. With NLP, computer programs interpret written language. NLP methods use a process called machine learning, where computer programs use data to learn how to perform different tasks with little or no human input.
What did the research team do?
First, the research team trained the NLP computer program to label topics in transcripts. The team used 279 transcripts of patient-doctor conversations from two previous studies. A researcher had already assigned labels to topics in these transcripts. Using these transcripts, the NLP computer program learned to associate specific words with specific labels. For example, the word cigarette would be labeled as smoking. The program could then label topics in other transcripts.
Next, the research team used the NLP computer program to create three types of NLP methods. Each method used a different statistical approach to label topics. The methods were called non-sequential, window-based, and sequential.
Then the research team tested how well the NLP methods worked to label topics in new transcripts. The team compared topic labels from the new NLP methods with topic labels assigned by a person. They also compared the new methods with a basic NLP method that only labeled the most common topics.
Patients provided input during the project, including how to explain complex research methods in a clear way.
What were the results?
The three new NLP methods labeled topics more accurately than the basic NLP method. The sequential NLP method labeled topics more like a person than the non-sequential NLP method.
When topics were similar, like weight and diet, the methods were less accurate.
What were the limits of the project?
How well the new NLP methods work depends on the data used to train the NLP computer program. The methods may be less accurate for topics that patients and doctors rarely talk about.
Future studies could test if adding other data to the training transcripts, like doctor’s notes, improves the methods’ accuracy.
How can people use the results?
Researchers can use the new NLP methods when studying conversations between patients and doctors to help doctors improve their communication skills.
Professional Abstract
Background
How clinicians communicate with patients during primary care visits can affect patient satisfaction and quality of care. Studying patient-clinician conversations can help clinicians improve their communication skills. Traditional methods for studying these conversations involve people reading transcripts of the conversations and adding labels to identify discussion topics based on specific criteria. Although these methods are often accurate, they are also labor intensive. Natural language processing (NLP), a type of machine learning method, automates the process of interpreting and labeling topics in transcribed conversations. NLP may offer an accurate and more efficient way to study these conversations.
Objective
To develop and evaluate machine learning NLP models for labeling topics in patient-clinician conversations
Study Design
Design Element | Description |
---|---|
Design | Development of machine learning NLP models |
Data Sources and Data Sets |
Transcripts of audio recordings of patient-provider interactions from the Mental Health Discussion (MHD) study and transcripts of video recordings from the Assessment of Doctor-Elderly Patient Encounters (ADEPT) study |
Analytic Approach | Development and comparison of independent NLP models, window-based NLP models, and fully sequential NLP models |
Outcomes | Accuracy of topic classification, precision, recall, F1 scores |
Methods
To develop NLP machine learning algorithms and models, researchers first trained the NLP algorithms to label topics in patient-clinician conversations. Researchers used 279 transcripts of patient-clinician conversations from two studies that already had 36 manually assigned topic labels, such as physical examination, cigarette use, or pain. The NLP algorithms learned to associate specific words in a conversation with the manually assigned topic labels and then predict the labels for other transcripts based on those associations.
Next, researchers developed three types of NLP classification models called non-sequential, window-based, and sequential. Each type of model used a different statistical method to label topics in the transcripts.
Researchers then evaluated the models’ accuracy in labeling topics compared with the manually assigned labels in the same transcripts. They also compared each model’s topic labels with a baseline NLP model that labeled the most common topics.
A patient advisory board provided input throughout the study, including how to explain complex research methods in a clear way.
Results
For all three model types, topic label accuracy was greater than the baseline NLP model’s accuracy level of 30%. The sequential models were more similar to manual labeling than the non-sequential models were.
One type of sequential model called Hierarchical Gated Recurrent Unit had the highest accuracy for labeling topics, ranging from 62% to 79%.
Errors in labeling conversation topics increased when topics were similar, such as diet and exercise.
Limitations
The accuracy of NLP models depends on the training data. NLP models may be less accurate for topics that arise infrequently in patient-clinician conversations.
Conclusions and Relevance
Machine learning NLP models can automatically label topics discussed during patient-clinician conversations, facilitating the analysis of these conversations to improve quality of care.
Future Research Needs
Future research could develop ways to improve the accuracy of NLP models by using other training data, such as clinicians’ notes.
Final Research Report
View this project's final research report.
Journal Citations
Related Journal Citations
Peer-Review Summary
Peer review of PCORI-funded research helps make sure the report presents complete, balanced, and useful information about the research. It also assesses how the project addressed PCORI’s Methodology Standards. During peer review, experts read a draft report of the research and provide comments about the report. These experts may include a scientist focused on the research topic, a specialist in research methods, a patient or caregiver, and a healthcare professional. These reviewers cannot have conflicts of interest with the study.
The peer reviewers point out where the draft report may need revision. For example, they may suggest ways to improve descriptions of the conduct of the study or to clarify the connection between results and conclusions. Sometimes, awardees revise their draft reports twice or more to address all of the reviewers’ comments.
Peer reviewers commented and the researchers made changes or provided responses. Those comments and responses included the following:
- The reviewers asked for clarity regarding the goals of the project. They noted that although the report alludes to how providers use their time in a clinic visit and to understanding the contents of a clinic visit, these concepts are not addressed further in the report. The researchers explained that they provided these examples of time use and content in clinical visits as examples of how the methods they developed in this study could be used to address those examples. The researchers revised their conclusions to more directly describe how the study results can inform providers about how to use their time and gain a greater understanding of visit content.
- Reviewers asked why, if a goal of the study was to evaluate emotions during the clinic visit, that coders used audio recordings of the visits rather than transcripts. The researchers explained that they did not have access to the audio recordings, and they confirmed that transcripts provided important emotional information as well.
- The reviewers appreciated the extent of community involvement in this study and the fact that patients were able to connect the future use of natural language processing as a feedback mechanism that empowers patients to advocate for their own health care. The reviewers suggested that the researchers consider using patients to review the visit transcripts and evaluate how well these methods captured the content and emotional valence of the doctor-patient interaction. The researchers noted that they did involve patients in informal meetings where content of the clinic visit was discussed but had not made this patient review a formal part of the process. The researchers agreed that this would be an interesting and useful addition to their study methods for the future.
- The reviewers asked the researchers to provide some estimations of what level of accuracy would indicate that the natural language processing methods are good enough to assess the content and emotional valence of doctor-patient interactions. The researchers agreed that this might be helpful but stated that a measure of accuracy would not necessarily provide a true measure of whether a method was good enough. They gave an example from the study results, where their method had no more than 62 percent accuracy for topic classification, but the method matched human predictions quite well, including in making similar errors.