Towards Making the Most of BERT in Neural Machine Translation

Abstract

GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (CTnmt) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed CTnmt} consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show CTnmt gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.

Cite

Text

Yang et al. "Towards Making the Most of BERT in Neural Machine Translation." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I05.6479

Markdown

[Yang et al. "Towards Making the Most of BERT in Neural Machine Translation." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/yang2020aaai-making/) doi:10.1609/AAAI.V34I05.6479

BibTeX

@inproceedings{yang2020aaai-making,
  title     = {{Towards Making the Most of BERT in Neural Machine Translation}},
  author    = {Yang, Jiacheng and Wang, Mingxuan and Zhou, Hao and Zhao, Chengqi and Zhang, Weinan and Yu, Yong and Li, Lei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {9378-9385},
  doi       = {10.1609/AAAI.V34I05.6479},
  url       = {https://mlanthology.org/aaai/2020/yang2020aaai-making/}
}