Area Attention

Abstract

Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.

Cite

Text

Li et al. "Area Attention." International Conference on Machine Learning, 2019.

Markdown

[Li et al. "Area Attention." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/li2019icml-area/)

BibTeX

@inproceedings{li2019icml-area,
  title     = {{Area Attention}},
  author    = {Li, Yang and Kaiser, Lukasz and Bengio, Samy and Si, Si},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {3846-3855},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/li2019icml-area/}
}