Dai, Mz

1 publications

NeurIPS 2025 S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models Mz Dai, Chenxu Yang, Qingyi Si