On the Origins of Linear Representations in Large Language Models
Abstract
An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.
Cite
Text
Jiang et al. "On the Origins of Linear Representations in Large Language Models." International Conference on Machine Learning, 2024.Markdown
[Jiang et al. "On the Origins of Linear Representations in Large Language Models." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/jiang2024icml-origins/)BibTeX
@inproceedings{jiang2024icml-origins,
title = {{On the Origins of Linear Representations in Large Language Models}},
author = {Jiang, Yibo and Rajendran, Goutham and Ravikumar, Pradeep Kumar and Aragam, Bryon and Veitch, Victor},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {21879-21911},
volume = {235},
url = {https://mlanthology.org/icml/2024/jiang2024icml-origins/}
}