TY - JOUR
T1 - LLM4CGDS
T2 - Large language model-based agents for Chinese graded document simplification
AU - Fang, Dengzhao
AU - Qiang, Jipeng
AU - Hou, Wenjie
AU - Zhu, Yi
AU - Gao, Jingtong
AU - Zhao, Xiangyu
N1 - Publisher Copyright:
Copyright © 2026. Published by Elsevier Ltd.
PY - 2026/4/1
Y1 - 2026/4/1
N2 - Graded reading tailors text difficulty to learners’ proficiency by producing multiple versions of the same content—an approach long embraced in language education but still dependent on labor-intensive, expert-driven adaptation. In this paper, we introduce the task of C hinese G raded D ocument S implification (CGDS) for non-native learners, which seeks to automate the creation of multi-level reading materials in accordance with established proficiency standards. Guided by the three stages of the Hanyu Shuiping Kaoshi (HSK) 3.0 framework (Levels 1–3 for Advanced, Levels 4–6 for Intermediate, and Levels 7–9 for Beginner learners), we propose Large Language Model for Chinese Graded Document Simplification (LLM4CGDS), a rule-guided, large language model (LLM)-based framework that integrates HSK-level readability constraints and external knowledge retrieval to control document-level simplification without requiring supervised fine-tuning. To foster further research, we construct two complementary datasets: J ourney to the W est D ocument S implification (JWDS) and M ulti- D omain D ocument S implification (MDDS) that covering diverse genres and difficulty levels. Experimental evaluation on two datasets demonstrates that LLM4CGDS substantially outperforms direct prompting of state-of-the-art LLMs in both readability control and meaning preservation.
AB - Graded reading tailors text difficulty to learners’ proficiency by producing multiple versions of the same content—an approach long embraced in language education but still dependent on labor-intensive, expert-driven adaptation. In this paper, we introduce the task of C hinese G raded D ocument S implification (CGDS) for non-native learners, which seeks to automate the creation of multi-level reading materials in accordance with established proficiency standards. Guided by the three stages of the Hanyu Shuiping Kaoshi (HSK) 3.0 framework (Levels 1–3 for Advanced, Levels 4–6 for Intermediate, and Levels 7–9 for Beginner learners), we propose Large Language Model for Chinese Graded Document Simplification (LLM4CGDS), a rule-guided, large language model (LLM)-based framework that integrates HSK-level readability constraints and external knowledge retrieval to control document-level simplification without requiring supervised fine-tuning. To foster further research, we construct two complementary datasets: J ourney to the W est D ocument S implification (JWDS) and M ulti- D omain D ocument S implification (MDDS) that covering diverse genres and difficulty levels. Experimental evaluation on two datasets demonstrates that LLM4CGDS substantially outperforms direct prompting of state-of-the-art LLMs in both readability control and meaning preservation.
KW - Graded reading
KW - Hanyu shuiping kaoshi
KW - Large language modeling
KW - Text simplification
UR - https://www.scopus.com/pages/publications/105029351100
U2 - 10.1016/j.engappai.2026.113905
DO - 10.1016/j.engappai.2026.113905
M3 - 文章
AN - SCOPUS:105029351100
SN - 0952-1976
VL - 169
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 113905
ER -