ScholarGEC: Enhancing Controllability of Large Language Model for Chinese Academic Grammatical Error Correction

Abstract

Large language models (LLMs) have demonstrated exceptional error detection capabilities and can correct sentences with high fluency in grammatical error correction (GEC) tasks. However, when correcting Chinese academic papers, LLMs face significant challenges of over-correction. To delve deeper into this issue, we explore the underlying reasons. On one hand, each discipline has its unique vocabulary and expressions, and LLMs have insufficient and incomplete understanding of domain-specific sentences. On the other hand, the controllability of generative LLMs in GEC tasks is inherently poor, and the traditional sequence-to-sequence (Seq2Seq) correction structure exacerbates this issue. Considering the two aforementioned factors, we propose a new error correction framework for Chinese academic GEC tasks using LLMs, named ScholarGEC. To improve LLMs’ understanding of domain-specific knowledge, we construct appropriate disciplinary knowledge prefixes for sentences and use this domain-specific knowledge data to fine-tune the LLM. To enhance the controllability of LLMs, we replace the traditional Seq2Seq structure with a Detection-Correction separated structure. We also introduce a special token during the process to improve the model’s error detection stability. Additionally, we incorporate iterative self-reflection to enhance the stability of the generation, in the three parts of LLM generation. Extensive experiments demonstrate the effectiveness and robustness of our framework on a Chinese GEC dataset composed of academic papers, and further analysis reveals the capabilities of our framework in enhancing LLM performance in general GEC tasks.

Cite

Text

Kong et al. "ScholarGEC: Enhancing Controllability of Large Language Model for Chinese Academic Grammatical Error Correction." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I23.34611

Markdown

[Kong et al. "ScholarGEC: Enhancing Controllability of Large Language Model for Chinese Academic Grammatical Error Correction." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/kong2025aaai-scholargec/) doi:10.1609/AAAI.V39I23.34611

BibTeX

@inproceedings{kong2025aaai-scholargec,
  title     = {{ScholarGEC: Enhancing Controllability of Large Language Model for Chinese Academic Grammatical Error Correction}},
  author    = {Kong, Zixiao and Wang, Xianquan and Shen, Shuanghong and Zhu, Keyu and Xu, Huibo and Su, Yu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {24339-24347},
  doi       = {10.1609/AAAI.V39I23.34611},
  url       = {https://mlanthology.org/aaai/2025/kong2025aaai-scholargec/}
}