Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert P. Dick, Ekdeep Singh Lubana, Hidenori Tanaka

ICML 2024 pp. 23758-23780

/icml/2024/khona2024icml-understanding/

Abstract

Stepwise inference protocols, such as scratchpads and chain-of-thought, help language models solve complex problems by decomposing them into a sequence of simpler subproblems. To unravel the underlying mechanisms of stepwise inference we propose to study autoregressive Transformer models on a synthetic task that embodies the multi-step nature of problems where stepwise inference is generally most useful. Specifically, we define a graph navigation problem wherein a model is tasked with traversing a path from a start to a goal node on the graph. We find we can empirically reproduce and analyze several phenomena observed at scale: (i) the stepwise inference reasoning gap, the cause of which we find in the structure of the training data; (ii) a diversity-accuracy trade-off in model generations as sampling temperature varies; (iii) a simplicity bias in the model’s output; and (iv) compositional generalization and a primacy bias with in-context exemplars. Overall, our work introduces a grounded, synthetic framework for studying stepwise inference and offers mechanistic hypotheses that can lay the foundation for a deeper understanding of this phenomenon.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Khona et al. "Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model." International Conference on Machine Learning, 2024.

Markdown

[Khona et al. "Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/khona2024icml-understanding/)

BibTeX

@inproceedings{khona2024icml-understanding,
  title     = {{Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model}},
  author    = {Khona, Mikail and Okawa, Maya and Hula, Jan and Ramesh, Rahul and Nishi, Kento and Dick, Robert P. and Lubana, Ekdeep Singh and Tanaka, Hidenori},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {23758-23780},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/khona2024icml-understanding/}
}