Compressive Transformers for Long-Range Sequence Modelling

Rae, Jack W.; Potapenko, Anna; Jayakumar, Siddhant M.; Lillicrap, Timothy P.

Compressive Transformers for Long-Range Sequence Modelling

Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap

ICLR 2020

/iclr/2020/rae2020iclr-compressive/

Abstract

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

PDF ICLR Semantic Scholar

Cite

Text

Rae et al. "Compressive Transformers for Long-Range Sequence Modelling." International Conference on Learning Representations, 2020.

Markdown

[Rae et al. "Compressive Transformers for Long-Range Sequence Modelling." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/rae2020iclr-compressive/)

BibTeX

@inproceedings{rae2020iclr-compressive,
  title     = {{Compressive Transformers for Long-Range Sequence Modelling}},
  author    = {Rae, Jack W. and Potapenko, Anna and Jayakumar, Siddhant M. and Lillicrap, Timothy P.},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/rae2020iclr-compressive/}
}