Constrained Belief Updating and Geometric Structures in Transformer Representations
Abstract
How do transformers trained on next-token prediction represent their inputs? Our analysis reveals that in simple settings, transformers form intermediate representations with fractal structures distinct from, yet closely related to, the geometry of belief states of an optimal predictor. We find the algorithmic process by which these representations form and connect this mechanism to constrained belief updating equations, offering insight into the geometric meaning of these fractals. These findings bridge the gap between the model-agnostic theory of belief state geometry and the specific architectural constraints of transformers.
Cite
Text
Piotrowski et al. "Constrained Belief Updating and Geometric Structures in Transformer Representations." NeurIPS 2024 Workshops: NeurReps, 2024.Markdown
[Piotrowski et al. "Constrained Belief Updating and Geometric Structures in Transformer Representations." NeurIPS 2024 Workshops: NeurReps, 2024.](https://mlanthology.org/neuripsw/2024/piotrowski2024neuripsw-constrained/)BibTeX
@inproceedings{piotrowski2024neuripsw-constrained,
title = {{Constrained Belief Updating and Geometric Structures in Transformer Representations}},
author = {Piotrowski, Mateusz and Riechers, Paul M. and Filan, Daniel and Shai, Adam},
booktitle = {NeurIPS 2024 Workshops: NeurReps},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/piotrowski2024neuripsw-constrained/}
}