Skip to main navigation Skip to search Skip to main content

LLM4CGDS: Large language model-based agents for Chinese graded document simplification

  • Dengzhao Fang
  • , Jipeng Qiang*
  • , Wenjie Hou
  • , Yi Zhu
  • , Jingtong Gao
  • , Xiangyu Zhao
  • *Corresponding author for this work
  • Yangzhou University
  • School of Artificial Intelligence
  • City University of Hong Kong

Research output: Contribution to journalArticlepeer-review

Abstract

Graded reading tailors text difficulty to learners’ proficiency by producing multiple versions of the same content—an approach long embraced in language education but still dependent on labor-intensive, expert-driven adaptation. In this paper, we introduce the task of C hinese G raded D ocument S implification (CGDS) for non-native learners, which seeks to automate the creation of multi-level reading materials in accordance with established proficiency standards. Guided by the three stages of the Hanyu Shuiping Kaoshi (HSK) 3.0 framework (Levels 1–3 for Advanced, Levels 4–6 for Intermediate, and Levels 7–9 for Beginner learners), we propose Large Language Model for Chinese Graded Document Simplification (LLM4CGDS), a rule-guided, large language model (LLM)-based framework that integrates HSK-level readability constraints and external knowledge retrieval to control document-level simplification without requiring supervised fine-tuning. To foster further research, we construct two complementary datasets: J ourney to the W est D ocument S implification (JWDS) and M ulti- D omain D ocument S implification (MDDS) that covering diverse genres and difficulty levels. Experimental evaluation on two datasets demonstrates that LLM4CGDS substantially outperforms direct prompting of state-of-the-art LLMs in both readability control and meaning preservation.

Original languageEnglish
Article number113905
JournalEngineering Applications of Artificial Intelligence
Volume169
DOIs
StatePublished - 1 Apr 2026
Externally publishedYes

Keywords

  • Graded reading
  • Hanyu shuiping kaoshi
  • Large language modeling
  • Text simplification

Fingerprint

Dive into the research topics of 'LLM4CGDS: Large language model-based agents for Chinese graded document simplification'. Together they form a unique fingerprint.

Cite this