ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Lin, Bill Yuchen; Le Bras, Ronan; Richardson, Kyle; Sabharwal, Ashish; Poovendran, Radha; Clark, Peter; Choi, Yejin

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, Yejin Choi

ICML 2025 pp. 37889-37905

/icml/2025/lin2025icml-zebralogic/

Abstract

We investigate the logical reasoning capabilities of Large Language Models (LLMs) and their scalability across complex deductive tasks. Using ZebraLogic, a newly developed benchmark dataset of logic grid puzzles derived from constraint satisfaction problems (CSPs), we systematically evaluate LLM performance. ZebraLogic spans a broad range of search space complexities and incorporates diverse logical constraints, providing a controlled environment to assess reasoning abilities. Our results reveal a significant decline in accuracy as problem complexity increases—a phenomenon we term the “curse of complexity.” Notably, this limitation persists even with scaling model size and inference-time computation, suggesting fundamental constraints in current LLM reasoning capabilities. Additionally, we explore strategies such as Best-of-N sampling, backtracking mechanisms, and self-verification prompts to enhance logical reasoning performance. Our findings provide critical insights into the scaling behavior of LLMs, highlight their limitations, and outline potential directions for advancing their reasoning capabilities.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Lin et al. "ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Lin et al. "ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/lin2025icml-zebralogic/)

BibTeX

@inproceedings{lin2025icml-zebralogic,
  title     = {{ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning}},
  author    = {Lin, Bill Yuchen and Le Bras, Ronan and Richardson, Kyle and Sabharwal, Ashish and Poovendran, Radha and Clark, Peter and Choi, Yejin},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {37889-37905},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/lin2025icml-zebralogic/}
}