Enhancing clinical documentation with voice processing and large language models: a study on the LAOS system

  • Yupeng Xu
  • , Huixun Jia
  • , Maolin Wang
  • , Jie Feng
  • , Xun Xu
  • , Haiyan Wang
  • , Jieqiong Chen
  • , Zheng Zheng
  • , Xiaoyan Yang
  • , Yue Shen
  • , Jian Wang
  • , Chenyi Zhuang
  • , Peng Wei
  • , Ruocheng Guo
  • , Xiangyu Zhao
  • , Junxiang Fan*
  • , Xiaodong Sun*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The growing volume of Electronic Health Records (EHRs) has enhanced patient care quality but significantly increased the cognitive workload on clinicians, particularly in ophthalmology where specialists handle 1.6 times more patient consultations than other specialties. This study introduces the “LLM-based Auxiliary Ophthalmic System (LAOS),” an integrated framework leveraging Large Language Models (LLMs) and audio processing to improve clinical documentation accuracy and efficiency. LAOS combines voice recognition with Retrieval-Augmented Generation (RAG) and Low-Rank Adaptation (LoRA) to convert clinical conversations into structured documentation while dynamically retrieving relevant medical knowledge. The system was evaluated across three critical documentation tasks: Admission Reports, Surgery Records, and Discharge Summaries. Through both quantitative metrics (BLEU, ROUGE-L, BERT Score) and clinical validation by board-certified physicians, LAOS demonstrated significant improvements in documentation completeness, accuracy, and efficiency. While challenges remain in balancing comprehensiveness with conciseness, this research highlights the potential of speech-enabled LLM systems to alleviate physician burnout, enhance documentation quality, and improve healthcare delivery.

Original languageEnglish
Article number798
Journalnpj Digital Medicine
Volume8
Issue number1
DOIs
StatePublished - Dec 2025
Externally publishedYes

Fingerprint

Dive into the research topics of 'Enhancing clinical documentation with voice processing and large language models: a study on the LAOS system'. Together they form a unique fingerprint.

Cite this