Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating
Abstract
Existing Neural Machine Translation (NMT) systems are generally trained on a large amount of sentence-level parallel data, and during prediction sentences are independently translated, ignoring cross-sentence contextual information. This leads to inconsistency between translated sentences. In order to address this issue, context-aware models have been proposed. However, document-level parallel data constitutes only a small part of the parallel data available, and many approaches build context-aware models based on a pre-trained frozen sentence-level translation model in a two-step training manner. The computational cost of these approaches is usually high. In this paper, we propose to make the most of layers pre-trained on sentence-level data in contextual representation learning, reusing representations from the sentence-level Transformer and significantly reducing the cost of incorporating contexts in translation. We find that representations from shallow layers of a pre-trained sentence-level encoder play a vital role in source context encoding, and propose to perform source context encoding upon weighted combinations of pre-trained encoder layers' outputs. Instead of separately performing source context and input encoding, we propose to iteratively and jointly encode the source input and its contexts and to generate input-aware context representations with a cross-attention layer and a gating mechanism, which resets irrelevant information in context encoding. Our context-aware Transformer model outperforms the recent CADec [Voita et al., 2019c] on the English-Russian subtitle data and is about twice as fast in training and decoding.
Cite
Text
Xu et al. "Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/544Markdown
[Xu et al. "Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/xu2020ijcai-efficient/) doi:10.24963/IJCAI.2020/544BibTeX
@inproceedings{xu2020ijcai-efficient,
title = {{Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating}},
author = {Xu, Hongfei and Xiong, Deyi and van Genabith, Josef and Liu, Qiuhui},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2020},
pages = {3933-3940},
doi = {10.24963/IJCAI.2020/544},
url = {https://mlanthology.org/ijcai/2020/xu2020ijcai-efficient/}
}