Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Xu, Ran; Chen, Jingjing; Ye, Jiayu; Wu, Yu; Yan, Jun; Yang, Carl; Yu, Hongkun

Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Ran Xu, Jingjing Chen, Jiayu Ye, Yu Wu, Jun Yan, Carl Yang, Hongkun Yu

ICLR 2026

/iclr/2026/xu2026iclr-incentivizing/

Abstract

Large Language Models (LLMs) are widely used as judges to evaluate response quality, providing a scalable alternative to human evaluation. However, most LLM judges operate solely on intrinsic text-based reasoning, limiting their ability to verify complex constraints or perform accurate computation. Motivated by the success of tool-integrated reasoning (TIR) in numerous tasks, we propose TIR-Judge, an end-to-end RL framework for training LLM judges that integrates a Python executor for precise evaluation. TIR-Judge is built on three principles: (i) diverse training across verifiable and non-verifiable domains, (ii) flexible judgment formats (pointwise, pairwise, listwise), and (iii) iterative RL that enables bootstrapping directly from a base model without distillation. On six public benchmarks, TIR-Judge surpasses strong reasoning-based judges by up to 6.4% (pointwise) and 7.7% (pairwise), and achieves listwise performance comparable to Claude-Opus-4 despite having only 8B parameters. Remarkably, TIR-Judge-Zero—trained entirely without distillation—matches the performance of the distilled variants, showing that tool-augmented judges can self-improve through iterative reinforcement learning.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Xu et al. "Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Xu et al. "Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xu2026iclr-incentivizing/)

BibTeX

@inproceedings{xu2026iclr-incentivizing,
  title     = {{Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning}},
  author    = {Xu, Ran and Chen, Jingjing and Ye, Jiayu and Wu, Yu and Yan, Jun and Yang, Carl and Yu, Hongkun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/xu2026iclr-incentivizing/}
}