Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT

Abstract

Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity.

Cite

Text

Hazineh et al. "Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT." NeurIPS 2023 Workshops: SoLaR, 2023.

Markdown

[Hazineh et al. "Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT." NeurIPS 2023 Workshops: SoLaR, 2023.](https://mlanthology.org/neuripsw/2023/hazineh2023neuripsw-linear/)

BibTeX

@inproceedings{hazineh2023neuripsw-linear,
  title     = {{Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT}},
  author    = {Hazineh, Dean and Zhang, Zechen and Chiu, Jeffrey},
  booktitle = {NeurIPS 2023 Workshops: SoLaR},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/hazineh2023neuripsw-linear/}
}