SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models

Monti, Sebastiano; Nicolini, Carlo; Pellegrini, Giovanni; Staiano, Jacopo; Lepri, Bruno

SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models

Sebastiano Monti, Carlo Nicolini, Giovanni Pellegrini, Jacopo Staiano, Bruno Lepri

TMLR 2026

/tmlr/2026/monti2026tmlr-sokobench/

Abstract

Although the capabilities of Large Language Models and Large Reasoning Models have been increasingly tested on complex reasoning tasks, their long-horizon planning abilities have not yet been extensively investigated. In this work, we provide a systematic assessment of the planning and long-horizon reasoning capabilities of state-of-the-art Large Reasoning Models (LRMs). We propose a novel benchmark based on Sokoban puzzles, intentionally simplified to isolate long-horizon planning from state persistence. Our findings reveal a consistent degradation in planning performance when more than 25 moves are required to reach the solution, suggesting non-recoverable error accumulation under single-pass autoregressive decoding. We show that equipping LRMs with Planning Domain Definition Language (PDDL) parsing, validation, and solving tools allows for modest improvements, suggesting that character level counting and long yet simple state tracking might not be overcome by test-time scaling approaches alone.

PDF TMLR OpenReview Semantic Scholar

Cite

Text

Monti et al. "SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models." Transactions on Machine Learning Research, 2026.

Markdown

[Monti et al. "SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/monti2026tmlr-sokobench/)

BibTeX

@article{monti2026tmlr-sokobench,
  title     = {{SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models}},
  author    = {Monti, Sebastiano and Nicolini, Carlo and Pellegrini, Giovanni and Staiano, Jacopo and Lepri, Bruno},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/monti2026tmlr-sokobench/}
}