Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

Jiang, Shuyang; Liao, Yusheng; Zhang, Ya; Wang, Yanfeng; Wang, Yu

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

Shuyang Jiang, Yusheng Liao, Ya Zhang, Yanfeng Wang, Yu Wang

ICLR 2026

/iclr/2026/jiang2026iclr-overthinking/

Abstract

While large reasoning models trained with critic-free reinforcement learning and verifiable rewards (RLVR) represent the state-of-the-art, their practical utility is hampered by ``overthinking'', a critical issue where models generate excessively long reasoning paths without any performance benefit. Existing solutions that penalize length often fail, inducing performance degradation due to a fundamental misalignment between trajectory-level rewards and token-level optimization. In this work, we introduce a novel framework, DECS, built on our theoretical discovery of two previously unaddressed flaws in current length rewards: (1) the erroneous penalization of essential exploratory tokens and (2) the inadvertent rewarding of partial redundancy. Our framework's innovations include (i) a first-of-its-kind decoupled token-level reward mechanism that surgically distinguishes and penalizes redundant tokens, and (ii) a novel curriculum batch scheduling strategy to master the efficiency-efficacy equilibrium. Experimental results show DECS can achieve a dramatic reduction in reasoning tokens by over 50\% across seven benchmarks while simultaneously maintaining or even improving performance. It demonstrates conclusively that substantial gains in reasoning efficiency can be achieved without compromising a model's underlying reasoning power. Code is available at \url{https://github.com/pixas/DECS}.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Jiang et al. "Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling." International Conference on Learning Representations, 2026.

Markdown

[Jiang et al. "Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/jiang2026iclr-overthinking/)

BibTeX

@inproceedings{jiang2026iclr-overthinking,
  title     = {{Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling}},
  author    = {Jiang, Shuyang and Liao, Yusheng and Zhang, Ya and Wang, Yanfeng and Wang, Yu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/jiang2026iclr-overthinking/}
}