A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model

Xin Sun, Tao Ge, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang

IJCAI 2022 pp. 4367-4374

doi:10.24963/IJCAI.2022/606 /ijcai/2022/sun2022ijcai-unified/

Abstract

Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns. In this paper, we propose a generic and language-independent strategy for multilingual GEC, which can train a GEC system effectively for a new non-English language with only two easy-to-access resources: 1) a pre-trained cross-lingual language model (PXLM) and 2) parallel translation data between English and the language. Our approach creates diverse parallel GEC data without any language-specific operations by taking the non-autoregressive translation generated by PXLM and the gold translation as error-corrected sentence pairs. Then, we reuse PXLM to initialize the GEC model and pre-train it with the synthetic data generated by itself, which yields further improvement. We evaluate our approach on three public benchmarks of GEC in different languages. It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian). Further analysis demonstrates that our data construction method is complementary to rule-based approaches.

PDF IJCAI Semantic Scholar

Cite

Text

Sun et al. "A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/606

Markdown

[Sun et al. "A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/sun2022ijcai-unified/) doi:10.24963/IJCAI.2022/606

BibTeX

@inproceedings{sun2022ijcai-unified,
  title     = {{A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model}},
  author    = {Sun, Xin and Ge, Tao and Ma, Shuming and Li, Jingjing and Wei, Furu and Wang, Houfeng},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {4367-4374},
  doi       = {10.24963/IJCAI.2022/606},
  url       = {https://mlanthology.org/ijcai/2022/sun2022ijcai-unified/}
}