Repository logo
  • English
  • Deutsch
  • Français
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. CRIS
  3. Publication
  4. Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review
 

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review

URI
https://arbor.bfh.ch/handle/arbor/45313
Version
Published
Identifiers
10.2196/68998
Date Issued
2025
Author(s)
Bednarczyk, Lydie
Reichenpfader, Daniel  
Gaudet-Blavignac, Christophe
Type
Article
Language
English
Subjects

summarization

large language models...

natural language proc...

health care

electronic health rec...

scoping review

translational researc...

artificial intelligen...

Abstract
Background: Information overload in electronic health records requires effective solutions to alleviate clinicians’ administrative tasks. Automatically summarizing clinical text has gained significant attention with the rise of large language models. While individual studies show optimism, a structured overview of the research landscape is lacking.
Objective: This study aims to present the current state of the art on clinical text summarization using large language models, evaluate the level of evidence in existing research and assess the applicability of performance findings in clinical settings.
Methods: This scoping review complied with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Literature published between January 1, 2019, and June 18, 2024, was identified from 5 databases: PubMed, Embase, Web of Science, IEEE Xplore, and ACM Digital Library. Studies were excluded if they did not describe transformer-based models, did not focus on clinical text summarization, did not engage with free-text data, were not original research, were nonretrievable, were not peer-reviewed, or were not in English, French, Spanish, or German. Data related to study context and characteristics, scope of research, and evaluation methodologies were systematically collected and analyzed by 3 authors independently.
Results: A total of 30 original studies were included in the analysis. All used observational retrospective designs, mainly using real patient data (n=28, 93%). The research landscape demonstrated a narrow research focus, often centered on summarizing radiology reports (n=17, 57%), primarily involving data from the intensive care unit (n=15, 50%) of US-based institutions (n=19, 73%), in English (n=26, 87%). This focus aligned with the frequent reliance on the open-source Medical Information Mart for Intensive Care dataset (n=15, 50%). Summarization methodologies predominantly involved abstractive approaches (n=17, 57%) on single-document inputs (n=4, 13%) with unstructured data (n=13, 43%), yet reporting on methodological details remained inconsistent across studies. Model selection involved both open-source models (n=26, 87%) and proprietary models (n=7, 23%). Evaluation frameworks were highly heterogeneous. All studies conducted internal validation, but external validation (n=2, 7%), failure analysis (n=6, 20%), and patient safety risks analysis (n=1, 3%) were infrequent, and none reported bias assessment. Most studies used both automated metrics and human evaluation (n=16, 53%), while 10 (33%) used only automated metrics, and 4 (13%) only human evaluation.
Conclusions: Key barriers hinder the translation of current research into trustworthy, clinically valid applications. Current research remains exploratory and limited in scope, with many applications yet to be explored. Performance assessments often lack reliability, and clinical impact evaluations are insufficient raising concerns about model utility, safety, fairness, and data privacy. Advancing the field requires more robust evaluation frameworks, a broader research scope, and a stronger focus on real-world applicability.
DOI
https://doi.org/10.24451/dspace/11944
Publisher DOI
10.2196/68998
Journal or Serie
Journal of Medical Internet Research
ISSN
1439-4456
Publisher URL
https://www.jmir.org/2025/1/e68998
Organization
Technik und Informatik  
Volume
27
Publisher
JMIR Publications
Submitter
Reichenpfader, Daniel
Citation apa
Bednarczyk, L., Reichenpfader, D., & Gaudet-Blavignac, C. (2025). Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review. In Journal of Medical Internet Research (Vol. 27). JMIR Publications. https://doi.org/10.24451/dspace/11944
File(s)
Loading...
Thumbnail Image
Download

open access

Name

jmir-2025-1-e68998.pdf

License
Attribution 4.0 International
Version
published
Size

894.64 KB

Format

Adobe PDF

Checksum (MD5)

831351c48fa6e1e7eccd66efbf91a116

About ARBOR

Built with DSpace-CRIS software - System hosted and mantained by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Our institution