Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

Abstract

Guesstimation, the task of making approximate quantity estimates, is a common real-world challenge. However, it has been largely overlooked in large language models (LLMs) research. We introduce a novel guesstimation dataset, MARBLES. This dataset requires one to estimate how many items (e.g., marbles) can fit into containers (e.g., a one-cup measuring cup), both with and without accompanying images. Inspired by the social science concept of the ''Wisdom of Crowds'' (WOC) - taking the median from estimates from a crowd), which has proven effective in guesstimation, we propose ''WOC decoding'' strategy for LLM guesstimation. We show that LLMs perform well on guesstimation, suggesting that they possess some level of a "world model" necessary for guesstimation. Moreover, similar to human performance, the WOC decoding method improves LLM guesstimation accuracy. Furthermore, the inclusion of images in the multimodal condition enhances model performance. These results highlight the value of WOC decoding strategy for LLMs and position guesstimation as a probe for evaluating LLMs' world model.

Cite

Text

Chuang et al. "Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding." NeurIPS 2024 Workshops: Behavioral_ML, 2024.

Markdown

[Chuang et al. "Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding." NeurIPS 2024 Workshops: Behavioral_ML, 2024.](https://mlanthology.org/neuripsw/2024/chuang2024neuripsw-probing/)

BibTeX

@inproceedings{chuang2024neuripsw-probing,
  title     = {{Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding}},
  author    = {Chuang, Yun-Shiuan and Harlalka, Nikunj and Narendran, Sameer and Cheung, Alexander and Gao, Sizhe and Suresh, Siddharth and Hu, Junjie and Rogers, Timothy T.},
  booktitle = {NeurIPS 2024 Workshops: Behavioral_ML},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/chuang2024neuripsw-probing/}
}