Do LLMs Selectively Encode the Goal of an Agent's Reach?

Abstract

In this work, we investigate whether large language models (LLMs) exhibit one of the earliest Theory of Mind-like behaviors: selectively encoding the goal object of an actor's reach (Woodward, 1998). We prompt state-of-the-art LLMs with ambiguous examples that can be explained both by an object or a location being the goal of an actor's reach, and evaluate the model's bias. We compare the magnitude of the bias in three situations: i) an agent is acting purposefully, ii) an inanimate object is acted upon, and iii) an agent is acting accidentally. We find that two models show a selective bias for agents acting purposefully, but are biased differently than humans. Additionally, the encoding is not robust to semantically equivalent prompt variations. We discuss how this bias compares to the bias infants show and provide a cautionary tale of evaluating machine Theory of Mind (ToM). We release our dataset and code.

Cite

Text

Ruis et al. "Do LLMs Selectively Encode the Goal of an Agent's Reach?." ICML 2023 Workshops: ToM, 2023.

Markdown

[Ruis et al. "Do LLMs Selectively Encode the Goal of an Agent's Reach?." ICML 2023 Workshops: ToM, 2023.](https://mlanthology.org/icmlw/2023/ruis2023icmlw-llms/)

BibTeX

@inproceedings{ruis2023icmlw-llms,
  title     = {{Do LLMs Selectively Encode the Goal of an Agent's Reach?}},
  author    = {Ruis, Laura and Findeis, Arduin and Bradley, Herbie and Rahmani, Hossein A. and Choe, Kyoung Whan and Grefenstette, Edward and Rocktäschel, Tim},
  booktitle = {ICML 2023 Workshops: ToM},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/ruis2023icmlw-llms/}
}