Multilingual Neural Machine Translation with Soft Decoupled Encoding

Abstract

Multilingual training of neural machine translation (NMT) systems has led to impressive accuracy improvements on low-resource languages. However, there are still significant challenges in efficiently learning word representations in the face of paucity of data. In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data. SDE represents a word by its spelling through a character encoding, and its semantic meaning through a latent embedding space shared by all languages. Experiments on a standard dataset of four low-resource languages show consistent improvements over strong multilingual NMT baselines, with gains of up to 2 BLEU on one of the tested languages, achieving the new state-of-the-art on all four language pairs.

Cite

Text

Wang et al. "Multilingual Neural Machine Translation with Soft Decoupled Encoding." International Conference on Learning Representations, 2019.

Markdown

[Wang et al. "Multilingual Neural Machine Translation with Soft Decoupled Encoding." International Conference on Learning Representations, 2019.](https://mlanthology.org/iclr/2019/wang2019iclr-multilingual/)

BibTeX

@inproceedings{wang2019iclr-multilingual,
  title     = {{Multilingual Neural Machine Translation with Soft Decoupled Encoding}},
  author    = {Wang, Xinyi and Pham, Hieu and Arthur, Philip and Neubig, Graham},
  booktitle = {International Conference on Learning Representations},
  year      = {2019},
  url       = {https://mlanthology.org/iclr/2019/wang2019iclr-multilingual/}
}