THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Chang, Qikai; Zhang, Zhenrong; Hu, Pengfei; Du, Jun; Ma, Jiefeng; Pan, Yicheng; Zhang, Jianshu; Liu, Quan; Gao, Jianqing

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Qikai Chang, Zhenrong Zhang, Pengfei Hu, Jun Du, Jiefeng Ma, Yicheng Pan, Jianshu Zhang, Quan Liu, Jianqing Gao

ICLR 2026

/iclr/2026/chang2026iclr-thor/

Abstract

Large Language Models (LLMs) have made remarkable progress in mathematical reasoning, but still continue to struggle with high-precision tasks like numerical computation and formal symbolic manipulation. Integrating external tools has emerged as a promising approach to bridge this gap. Despite recent advances, existing methods struggle with three key challenges: constructing tool-integrated reasoning data, performing fine-grained optimization, and enhancing inference. To overcome these limitations, we propose THOR (Tool-Integrated Hierarchical Optimization via RL). First, we introduce TIRGen, a multi-agent based pipeline for constructing high-quality datasets of tool-integrated reasoning paths, aligning with the policy and generalizing well across diverse models. Second, to perform fine-grained hierarchical optimization, we introduce an RL strategy that jointly optimizes for both episode-level problem solving and step-level code generation. This is motivated by our key insight that the success of an intermediate tool call is a strong predictor of the final answer's correctness. Finally, THOR incorporates a self-correction mechanism that leverages immediate tool feedback to dynamically revise erroneous reasoning paths during inference. Our approach demonstrates strong generalization across diverse models, performing effectively in both reasoning and non-reasoning models. It further achieves state-of-the-art performance for models of a similar scale on multiple mathematical benchmarks, while also delivering consistent improvements on code benchmarks. Our code will be publicly available at https://github.com/JingMog/THOR.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Chang et al. "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning." International Conference on Learning Representations, 2026.

Markdown

[Chang et al. "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chang2026iclr-thor/)

BibTeX

@inproceedings{chang2026iclr-thor,
  title     = {{THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning}},
  author    = {Chang, Qikai and Zhang, Zhenrong and Hu, Pengfei and Du, Jun and Ma, Jiefeng and Pan, Yicheng and Zhang, Jianshu and Liu, Quan and Gao, Jianqing},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/chang2026iclr-thor/}
}