Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization

Abstract

Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the compositional table lookup task. NDR’s attention and gating patterns tend to be interpretable as an intuitive form of neural routing.

Cite

Text

Csordás et al. "Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization." NeurIPS 2021 Workshops: AIPLANS, 2021.

Markdown

[Csordás et al. "Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization." NeurIPS 2021 Workshops: AIPLANS, 2021.](https://mlanthology.org/neuripsw/2021/csordas2021neuripsw-learning/)

BibTeX

@inproceedings{csordas2021neuripsw-learning,
  title     = {{Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization}},
  author    = {Csordás, Róbert and Irie, Kazuki and Schmidhuber, Jürgen},
  booktitle = {NeurIPS 2021 Workshops: AIPLANS},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/csordas2021neuripsw-learning/}
}