Frustratingly Short Attention Spans in Neural Language Modeling

Abstract

Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid- and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step. This vector is used both for predicting the next token as well as for the key and value of a differentiable memory of a token history. In this paper, we propose a neural language model with a key-value attention mechanism that outputs separate representations for the key and value of a differentiable memory, as well as for encoding the next-word distribution. This model outperforms existing memory-augmented neural language models on two corpora. Yet, we found that our method mainly utilizes a memory of the five most recent output representations. This led to the unexpected main finding that a much simpler model based only on the concatenation of recent output representations from previous time steps is on par with more sophisticated memory-augmented neural language models.

Cite

Text

Daniluk et al. "Frustratingly Short Attention Spans in Neural Language Modeling." International Conference on Learning Representations, 2017.

Markdown

[Daniluk et al. "Frustratingly Short Attention Spans in Neural Language Modeling." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/daniluk2017iclr-frustratingly/)

BibTeX

@inproceedings{daniluk2017iclr-frustratingly,
  title     = {{Frustratingly Short Attention Spans in Neural Language Modeling}},
  author    = {Daniluk, Michal and Rocktäschel, Tim and Welbl, Johannes and Riedel, Sebastian},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/daniluk2017iclr-frustratingly/}
}