Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation: Analytical Study

Moser, Denis Sumin; Matthias Bender; Sariyar, Murat

doi:https://doi.org/10.24451/dspace/11441

Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation: Analytical Study

URI

https://arbor.bfh.ch/handle/arbor/44642

Version

Submitted

Date Issued

2024

Author(s)

Moser, Denis Sumin

Matthias Bender

Sariyar, Murat

Type

Article

Language

English

Abstract

Background:
In healthcare settings, especially in high-pressure environments like Emergency situations, the ability to document and communicate patient information rapidly and accurately is crucial. Traditional methods for manual documentation are often time-consuming and prone to errors, which can adversely affect patient outcomes. To address these challenges, there is growing interest in integrating advanced technologies, especially Large Language Models (LLMs), into medical communication systems. However, deploying LLMs in clinical environments presents unique challenges, including the need to ensure the accuracy of medical content and to mitigate the risk of generating irrelevant or misleading information.
Objective:
This paper aims to address these challenges by developing a Natural Language Processing (NLP) pipeline for the extraction of text from German rescue services treatment dialogues. The objectives are twofold: (1) to generate realistic, medically relevant dialogues where the ground truth is known, and (2) to accurately extract essential information from these dialogues to populate emergency protocols.
Methods:
This study utilizes the MIMIC-IV-ED dataset, a de-identified, publicly available resource, to generate synthetic dialogue data for emergency department scenarios. By selecting and anonymizing data from 100 patients, we created a baseline for generating realistic dialogues and evaluating an NLP pipeline. We applied the Post Randomization Method (PRAM) for non-mechanical data perturbation, ensuring patient privacy and data utility. Dialogue generation was conducted in two stages: initial generation using the "Zephyr-7b-beta" model, followed by refinement and translation into German using GPT-4 Turbo. A Retrieval-Augmented Generation (RAG) approach was developed for extracting relevant information from these dialogues, involving chunking, embedding, and dynamic prompt templates. The model's performance was evaluated through manual review and sentiment analysis, ensuring that the generated dialogues maintained clinical relevance and emotional accuracy.
Results:
The data generation pipeline produced 100 dialogues, with initial English dialogues averaging 2,000 tokens and German dialogues 4,000 tokens. Manual evaluation identified certain redundancies and formal language in the German dialogues. Sentiment analysis revealed a reduction in negative sentiment from 67% to 59% and an increase in positive sentiment from 27% to 38%, which may negatively impact text extraction, as positive sentiments may not align well with identifying critical topics such as suicidal thoughts. The RAG-based extraction system achieved high precision and recall in both nominal and numerical features in the initial dialogues, with F1-scores ranging from 86.21% to 100%. However, performance declined in the refined dialogues, with notable drops in precision, particularly for "Diagnosis" (60.82%) and "Pain Score" (57.61%).
Conclusions:
The results of the study underscore the system's robust capabilities in processing structured data efficiently, demonstrating its strength in managing well-defined, quantitative information. However, the findings also reveal limitations in the system’s ability to handle nuanced clinical language, particularly when it comes to non-English and non-Chinese languages.

DOI

https://doi.org/10.24451/dspace/11441

Publisher DOI

10.2196/preprints.65483

Journal or Serie

JMIR Medical Informatics

ISSN

2291-9694

Publisher URL

https://preprints.jmir.org/preprint/65483

Organization

Technik und Informatik

Institut für Optimierung und Datenanalyse IODA

Publisher

JMIR Publications

Submitter

Sariyar, Murat

Citation apa

Moser, D. S., Matthias Bender, & Sariyar, M. (2024). Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation: Analytical Study. JMIR Publications. https://doi.org/10.24451/dspace/11441

File(s)

restricted

Name

preprint-65483-submitted.pdf

License

Publisher

Version

Submitted

Size

1.06 MB

Format

Adobe PDF

Checksum (MD5)

8944183873f816a06f8956e0e18bf7f3