Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Abstract
Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) biases in the order of the compositions in the training data, results in Transformers that fail to compose some combinations of functions; and (4) the attention layers seem to select the capability to apply while the feed-forward layers execute the capability.
Cite
Text
Ramesh et al. "Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks." International Conference on Machine Learning, 2024.Markdown
[Ramesh et al. "Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/ramesh2024icml-compositional/)BibTeX
@inproceedings{ramesh2024icml-compositional,
title = {{Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks}},
author = {Ramesh, Rahul and Lubana, Ekdeep Singh and Khona, Mikail and Dick, Robert P. and Tanaka, Hidenori},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {42074-42103},
volume = {235},
url = {https://mlanthology.org/icml/2024/ramesh2024icml-compositional/}
}