Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Ramesh, Rahul; Lubana, Ekdeep Singh; Khona, Mikail; Dick, Robert P.; Tanaka, Hidenori

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka

ICML 2024 pp. 42074-42103

/icml/2024/ramesh2024icml-compositional/

Abstract

Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) biases in the order of the compositions in the training data, results in Transformers that fail to compose some combinations of functions; and (4) the attention layers seem to select the capability to apply while the feed-forward layers execute the capability.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Ramesh et al. "Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks." International Conference on Machine Learning, 2024.

Markdown

[Ramesh et al. "Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/ramesh2024icml-compositional/)

BibTeX

@inproceedings{ramesh2024icml-compositional,
  title     = {{Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks}},
  author    = {Ramesh, Rahul and Lubana, Ekdeep Singh and Khona, Mikail and Dick, Robert P. and Tanaka, Hidenori},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {42074-42103},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/ramesh2024icml-compositional/}
}