Attention Is All You Need
Abstract
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.
Cite
Text
Vaswani et al. "Attention Is All You Need." Neural Information Processing Systems, 2017.Markdown
[Vaswani et al. "Attention Is All You Need." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/vaswani2017neurips-attention/)BibTeX
@inproceedings{vaswani2017neurips-attention,
title = {{Attention Is All You Need}},
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Łukasz and Polosukhin, Illia},
booktitle = {Neural Information Processing Systems},
year = {2017},
pages = {5998-6008},
url = {https://mlanthology.org/neurips/2017/vaswani2017neurips-attention/}
}