Encoding Recurrence into Transformers
Abstract
This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). Thus, recurrent dynamics introduced by the RNN layer can be encapsulated into the positional encodings of a multihead self-attention, and this makes it possible to seamlessly incorporate these recurrent dynamics into a Transformer, leading to a new module, Self-Attention with Recurrence (RSA). The proposed module can leverage the recurrent inductive bias of REMs to achieve a better sample efficiency than its corresponding baseline Transformer, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, and the effectiveness of RSA modules are demonstrated by four sequential learning tasks.
Cite
Text
Huang et al. "Encoding Recurrence into Transformers." International Conference on Learning Representations, 2023.Markdown
[Huang et al. "Encoding Recurrence into Transformers." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/huang2023iclr-encoding/)BibTeX
@inproceedings{huang2023iclr-encoding,
title = {{Encoding Recurrence into Transformers}},
author = {Huang, Feiqing and Lu, Kexin and Cai, Yuxi and Qin, Zhen and Fang, Yanwen and Tian, Guangjian and Li, Guodong},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/huang2023iclr-encoding/}
}