Rethinking Chain-of-Thought from the Perspective of Self-Training

Abstract

Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, i.e., over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments show that the proposed method achieves significant advantages in both performance and computational efficiency. Our code is available at: https://github.com/zongqianwu/ST-COT.

Cite

Text

Wu et al. "Rethinking Chain-of-Thought from the Perspective of Self-Training." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Wu et al. "Rethinking Chain-of-Thought from the Perspective of Self-Training." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wu2025icml-rethinking/)

BibTeX

@inproceedings{wu2025icml-rethinking,
  title     = {{Rethinking Chain-of-Thought from the Perspective of Self-Training}},
  author    = {Wu, Zongqian and Xu, Baoduo and Cui, Ruochen and Zhan, Mengmeng and Zhu, Xiaofeng and Feng, Lei},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {67917-67937},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/wu2025icml-rethinking/}
}