ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Abstract
We investigate the logical reasoning capabilities of Large Language Models (LLMs) and their scalability across complex deductive tasks. Using ZebraLogic, a newly developed benchmark dataset of logic grid puzzles derived from constraint satisfaction problems (CSPs), we systematically evaluate LLM performance. ZebraLogic spans a broad range of search space complexities and incorporates diverse logical constraints, providing a controlled environment to assess reasoning abilities. Our results reveal a significant decline in accuracy as problem complexity increases—a phenomenon we term the “curse of complexity.” Notably, this limitation persists even with scaling model size and inference-time computation, suggesting fundamental constraints in current LLM reasoning capabilities. Additionally, we explore strategies such as Best-of-N sampling, backtracking mechanisms, and self-verification prompts to enhance logical reasoning performance. Our findings provide critical insights into the scaling behavior of LLMs, highlight their limitations, and outline potential directions for advancing their reasoning capabilities.
Cite
Text
Lin et al. "ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Lin et al. "ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/lin2025icml-zebralogic/)BibTeX
@inproceedings{lin2025icml-zebralogic,
title = {{ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning}},
author = {Lin, Bill Yuchen and Le Bras, Ronan and Richardson, Kyle and Sabharwal, Ashish and Poovendran, Radha and Clark, Peter and Choi, Yejin},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {37889-37905},
volume = {267},
url = {https://mlanthology.org/icml/2025/lin2025icml-zebralogic/}
}