A Peek into Token Bias: Large Language Models Are Not yet Genuine Reasoners

Abstract

This study proposes a hypothesis-testing framework to determine whether large language models (LLMs) possess genuine reasoning abilities or rely on token bias. Carefully-controlled synthetic datasets are generated, and null hypotheses assuming LLMs' reasoning capabilities are tested with statistical guarantees. Inconsistent behavior during experiments leads to the rejection of null hypotheses. Our findings, using the conjunction fallacy as a quintessential example, suggest that current LLMs still struggle with probabilistic reasoning, with apparent performance improvements largely attributable to token bias.

Cite

Text

Jiang et al. "A Peek into Token Bias: Large Language Models Are Not yet Genuine Reasoners." ICML 2024 Workshops: LLMs_and_Cognition, 2024.

Markdown

[Jiang et al. "A Peek into Token Bias: Large Language Models Are Not yet Genuine Reasoners." ICML 2024 Workshops: LLMs_and_Cognition, 2024.](https://mlanthology.org/icmlw/2024/jiang2024icmlw-peek/)

BibTeX

@inproceedings{jiang2024icmlw-peek,
  title     = {{A Peek into Token Bias: Large Language Models Are Not yet Genuine Reasoners}},
  author    = {Jiang, Bowen and Xie, Yangxinyu and Hao, Zhuoqun and Wang, Xiaomeng and Mallick, Tanwi and Su, Weijie J and Taylor, Camillo Jose and Roth, Dan},
  booktitle = {ICML 2024 Workshops: LLMs_and_Cognition},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/jiang2024icmlw-peek/}
}