Attention Is All You Need
Abstract
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attention mechanisms. We propose a novel, simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our single model with 165 million parameters, achieves 27.5 BLEU on English-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previous single state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.
Cite
Text
Vaswani et al. "Attention Is All You Need." Neural Information Processing Systems, 2017.Markdown
[Vaswani et al. "Attention Is All You Need." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/vaswani2017neurips-attention/)BibTeX
@inproceedings{vaswani2017neurips-attention,
title = {{Attention Is All You Need}},
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Łukasz and Polosukhin, Illia},
booktitle = {Neural Information Processing Systems},
year = {2017},
pages = {5998-6008},
url = {https://mlanthology.org/neurips/2017/vaswani2017neurips-attention/}
}