ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Feng, Jiazhan; Huang, Shijue; Qu, Xingwei; Zhang, Ge; Qin, Yujia; Zhong, Baoquan; Jiang, Chengquan; Chi, Jinxin; Zhong, Wanjun

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, Wanjun Zhong

ICLR 2026

/iclr/2026/feng2026iclr-retool/

Abstract

While reasoning models trained with reinforcement learning (RL) excel in reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex equation solving—areas where computational tools like code interpreters (CI) demonstrate distinct advantages. To bridge this gap, we propose ReTool, which enhances long-form reasoning with tool-integrated learning, including two key features: (1) dynamic interleaving of real-time code execution within natural language reasoning processes, and (2) an automated RL paradigm that allows policy rollouts with multi-turn real-time code execution and teaches the model in learning when and how to invoke tools based on outcome feedback. ReTool employs a systematic training framework, beginning with synthetic code-augmented long-form reasoning data for cold-start training. Subsequent RL training leverages task outcomes as rewards to iteratively refine the model's tool use strategy, enabling autonomous discovery of optimal tool invocation patterns without human priors. Experiments on challenging MATH Olympiad benchmark AIME demonstrate ReTool's superiority: Our 32B model achieves 67% accuracy with 400 training steps, outperforming text-based RL baseline (40% accuracy, 1080 steps) in performance and efficiency. Remarkably, ReTool-32B attains 72.5% accuracy in extended settings, surpassing OpenAI's o1-preview by 27.9%. Further analysis reveals generalization to broader tool-use scenarios and emergent behaviors such as code self-correction, signaling an ''aha moment'' in which the model autonomously masters adaptive tool use. These findings highlight the promise of outcome-driven tool integration for advancing complex mathematical reasoning and offer new insights into hybrid neuro-symbolic systems.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Feng et al. "ReTool: Reinforcement Learning for Strategic Tool Use in LLMs." International Conference on Learning Representations, 2026.

Markdown

[Feng et al. "ReTool: Reinforcement Learning for Strategic Tool Use in LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/feng2026iclr-retool/)

BibTeX

@inproceedings{feng2026iclr-retool,
  title     = {{ReTool: Reinforcement Learning for Strategic Tool Use in LLMs}},
  author    = {Feng, Jiazhan and Huang, Shijue and Qu, Xingwei and Zhang, Ge and Qin, Yujia and Zhong, Baoquan and Jiang, Chengquan and Chi, Jinxin and Zhong, Wanjun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/feng2026iclr-retool/}
}