On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

Wen, Kaiyue; Dang, Xingyu; Lyu, Kaifeng

On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

ICLRW 2024

/iclrw/2024/wen2024iclrw-representation/

Abstract

This paper investigates the limitations of Recurrent Neural Networks (RNNs) in algorithmic tasks, particularly in comparison with Transformers. Focusing on a reasoning task IsTree deciding whether a graph is a tree, we demonstrate that RNNs with $o(n)$ parameters, even with Chain-of-Thought (CoT), cannot solve this task for graphs with size $n$, unlike Transformers which can solve the task with CoT and only $O(\log n)$ bit parameters. Our experiments confirm this representation gap. To overcome this limitation, we propose augmenting RNNs with in-context retrieval capabilities, specifically using regular expressions. This enhancement enables RNNs to solve IsTree and other algorithmic problems in $\mathsf{P}$, maintaining their memory efficiency and closing the gap with Transformers.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Wen et al. "On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval." ICLR 2024 Workshops: ME-FoMo, 2024.

Markdown

[Wen et al. "On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval." ICLR 2024 Workshops: ME-FoMo, 2024.](https://mlanthology.org/iclrw/2024/wen2024iclrw-representation/)

BibTeX

@inproceedings{wen2024iclrw-representation,
  title     = {{On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval}},
  author    = {Wen, Kaiyue and Dang, Xingyu and Lyu, Kaifeng},
  booktitle = {ICLR 2024 Workshops: ME-FoMo},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/wen2024iclrw-representation/}
}