A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model
Abstract
Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns. In this paper, we propose a generic and language-independent strategy for multilingual GEC, which can train a GEC system effectively for a new non-English language with only two easy-to-access resources: 1) a pre-trained cross-lingual language model (PXLM) and 2) parallel translation data between English and the language. Our approach creates diverse parallel GEC data without any language-specific operations by taking the non-autoregressive translation generated by PXLM and the gold translation as error-corrected sentence pairs. Then, we reuse PXLM to initialize the GEC model and pre-train it with the synthetic data generated by itself, which yields further improvement. We evaluate our approach on three public benchmarks of GEC in different languages. It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian). Further analysis demonstrates that our data construction method is complementary to rule-based approaches.
Cite
Text
Sun et al. "A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/606Markdown
[Sun et al. "A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/sun2022ijcai-unified/) doi:10.24963/IJCAI.2022/606BibTeX
@inproceedings{sun2022ijcai-unified,
title = {{A Unified Strategy for Multilingual Grammatical Error Correction with Pre-Trained Cross-Lingual Language Model}},
author = {Sun, Xin and Ge, Tao and Ma, Shuming and Li, Jingjing and Wei, Furu and Wang, Houfeng},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {4367-4374},
doi = {10.24963/IJCAI.2022/606},
url = {https://mlanthology.org/ijcai/2022/sun2022ijcai-unified/}
}