Encoding Recurrence into Transformers

Huang, Feiqing; Lu, Kexin; Cai, Yuxi; Qin, Zhen; Fang, Yanwen; Tian, Guangjian; Li, Guodong

Encoding Recurrence into Transformers

Feiqing Huang, Kexin Lu, Yuxi Cai, Zhen Qin, Yanwen Fang, Guangjian Tian, Guodong Li

ICLR 2023

/iclr/2023/huang2023iclr-encoding/

Abstract

This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). Thus, recurrent dynamics introduced by the RNN layer can be encapsulated into the positional encodings of a multihead self-attention, and this makes it possible to seamlessly incorporate these recurrent dynamics into a Transformer, leading to a new module, Self-Attention with Recurrence (RSA). The proposed module can leverage the recurrent inductive bias of REMs to achieve a better sample efficiency than its corresponding baseline Transformer, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, and the effectiveness of RSA modules are demonstrated by four sequential learning tasks.

PDF ICLR Semantic Scholar

Cite

Text

Huang et al. "Encoding Recurrence into Transformers." International Conference on Learning Representations, 2023.

Markdown

[Huang et al. "Encoding Recurrence into Transformers." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/huang2023iclr-encoding/)

BibTeX

@inproceedings{huang2023iclr-encoding,
  title     = {{Encoding Recurrence into Transformers}},
  author    = {Huang, Feiqing and Lu, Kexin and Cai, Yuxi and Qin, Zhen and Fang, Yanwen and Tian, Guangjian and Li, Guodong},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/huang2023iclr-encoding/}
}