Attention and Augmented Recurrent Neural Networks

Abstract

The basic RNN design struggles with longer sequences, but a special variant—“long short-term memory” networks —can even work with these. Such models have been found to be very powerful, achieving remarkable results in many tasks including translation, voice recognition, and image captioning. As a result, recurrent neural networks have become very widespread in the last few years. As this has happened, we’ve seen a growing number of attempts to augment RNNs with new properties. Four directions stand out as particularly exciting:

Cite

Text

Olah and Carter. "Attention and Augmented Recurrent Neural Networks." Distill, 2016. doi:10.23915/distill.00001

Markdown

[Olah and Carter. "Attention and Augmented Recurrent Neural Networks." Distill, 2016.](https://mlanthology.org/distill/2016/olah2016distill-attention/) doi:10.23915/distill.00001

BibTeX

@article{olah2016distill-attention,
  title     = {{Attention and Augmented Recurrent Neural Networks}},
  author    = {Olah, Chris and Carter, Shan},
  journal   = {Distill},
  year      = {2016},
  doi       = {10.23915/distill.00001},
  url       = {https://mlanthology.org/distill/2016/olah2016distill-attention/}
}