Monotonic Multihead Attention

Ma, Xutai; Pino, Juan; Cross, James; Puzon, Liezl; Gu, Jiatao

Monotonic Multihead Attention

Xutai Ma, Juan Pino, James Cross, Liezl Puzon, Jiatao Gu

ICLR 2020

/iclr/2020/ma2020iclr-monotonic/

Abstract

Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approach for this task either apply a fixed policy on transformer, or a learnable monotonic attention on a weaker recurrent neural network based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which introduced the monotonic attention mechanism to multihead attention. We also introduced two novel interpretable approaches for latency control that are specifically designed for multiple attentions. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach.

PDF ICLR Semantic Scholar

Cite

Text

Ma et al. "Monotonic Multihead Attention." International Conference on Learning Representations, 2020.

Markdown

[Ma et al. "Monotonic Multihead Attention." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/ma2020iclr-monotonic/)

BibTeX

@inproceedings{ma2020iclr-monotonic,
  title     = {{Monotonic Multihead Attention}},
  author    = {Ma, Xutai and Pino, Juan and Cross, James and Puzon, Liezl and Gu, Jiatao},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/ma2020iclr-monotonic/}
}