KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning

Abstract

Recent advances have demonstrated that integrating reinforcement learning with rule-based rewards can significantly enhance the reasoning capabilities of large language models (LLMs), even without supervised fine-tuning (SFT). However, prevalent reinforcement learning algorithms such as GRPO and its variants like DAPO, suffer from a coarse granularity issue when computing the advantage. Specifically, they compute rollout-level advantages that assign identical values to every token within a sequence, failing to capture token-specific contributions. To address this limitation, we propose Key-token Advantage Estimation (KTAE)—a novel algorithm that estimates fine-grained, token-level advantages without introducing additional models. KTAE leverages the correctness of sampled rollouts and applies statistical analysis to quantify the importance of individual tokens within a sequence to the final outcome. This quantified token-level importance is then combined with the rollout-level advantage to obtain a more fine-grained token-level advantage estimation. Empirical results show that models trained with GRPO+KTAE and DAPO+KTAE outperform baseline methods across five mathematical reasoning benchmarks. Notably, they achieve higher accuracy with shorter responses and even surpass R1-Distill-Qwen-1.5B using the same base model.

Cite

Text

Sun et al. "KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Sun et al. "KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/sun2025neurips-ktae/)

BibTeX

@inproceedings{sun2025neurips-ktae,
  title     = {{KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning}},
  author    = {Sun, Wei and Yang, Wen and Jian, Pu and Du, Qianlong and Cui, Fuwei and Ren, Shuo and Zhang, Jiajun},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/sun2025neurips-ktae/}
}