AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints

Zeng, Yirong; Ding, Xiao; Liu, Yufei; Wang, Yuxian; Du, Qunyao; Hou, Yutai; Ning, Wu; Song, Haonan; Tang, Duyu; Tu, Dandan; Qin, Bing; Liu, Ting

AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints

Yirong Zeng, Xiao Ding, Yufei Liu, Yuxian Wang, Qunyao Du, Yutai Hou, Wu Ning, Haonan Song, Duyu Tang, Dandan Tu, Bing Qin, Ting Liu

ICLR 2026

/iclr/2026/zeng2026iclr-autotool/

Abstract

Tool use represents a critical capability for AI agents, with recent advances focusing on leveraging reinforcement learning (RL) for test-time scaling to achieve better performance through more deliberate reasoning. However, there are some key challenges in current RL-based scaling approaches: (a) direct RL training often struggles to scale up thinking length sufficiently to solve complex problems, and (b) scaled-up models tend to overthink simpler problems, resulting in substantial token inefficiency. To address these challenges, we propose a novel training paradigm that first employs warm-up supervised fine-tuning to help models distinguish between simple and complex problems, followed by RL that enable models to automatically determine appropriate reasoning trajectories. Furthermore, to tackle the issue of automatic thinking-length scaling, we discover that entropy-based optimization objectives effectively maintain model diversity while successfully unlocking the model's scaling capabilities. Based on this insight, we introduce an entropy-based long-short reasoning fusion RL strategy. Our experiments on three benchmarks demonstrate that model successfully achieves auto-scaling for efficient tool use, achieving significant 9.8\% accuracy improvements while reducing computational overhead by ~81\%.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zeng et al. "AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints." International Conference on Learning Representations, 2026.

Markdown

[Zeng et al. "AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zeng2026iclr-autotool/)

BibTeX

@inproceedings{zeng2026iclr-autotool,
  title     = {{AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints}},
  author    = {Zeng, Yirong and Ding, Xiao and Liu, Yufei and Wang, Yuxian and Du, Qunyao and Hou, Yutai and Ning, Wu and Song, Haonan and Tang, Duyu and Tu, Dandan and Qin, Bing and Liu, Ting},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zeng2026iclr-autotool/}
}