Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Abstract

A key paradigm to improve the reasoning capabilities of large language models (LLMs) is to allocate more inference-time compute to search against a verifier or reward model. This process can then be utilized to refine the pretrained model or distill its reasoning patterns into more efficient models. In this paper, we study inference-time computation by viewing chain-of-thought (CoT) generation as a metastable Markov process: easy reasoning steps (e.g., algebraic manipulations) form densely connected clusters, while hard reasoning steps (e.g., applying a relevant theorem) create sparse, low-probability edges between clusters, leading to phase transitions at longer timescales. Under this framework, we prove that implementing a search protocol that rewards sparse edges improves CoT by decreasing the expected number of steps to reach different clusters. In contrast, we establish a limit on reasoning capability when the model is restricted to local information of the pretrained graph. We also show that the information gained by search can be utilized to obtain a better reasoning model: (1) the pretrained model can be directly finetuned to favor sparse edges via policy gradient methods, and moreover (2) a compressed metastable representation of the reasoning dynamics can be distilled into a smaller, more efficient model.

Cite

Text

Kim et al. "Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Kim et al. "Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/kim2025icml-metastable/)

BibTeX

@inproceedings{kim2025icml-metastable,
  title     = {{Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation}},
  author    = {Kim, Juno and Wu, Denny and Lee, Jason D. and Suzuki, Taiji},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {30791-30825},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/kim2025icml-metastable/}
}