Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Abstract

Tool-augmented large language models (LLMs), hereafter LLM agents, leverage external tools to solve diverse tasks and interface with the real world. However, current training practices largely rely on supervised fine-tuning (SFT) over static trajectories or reinforcement learning (RL) on narrow tasks, which generalize poorly beyond development settings and lead to brittleness with new tools and unseen workflows. Because code execution reflects many structural patterns of real-world workflows, we use coding problems as a structured substrate to build tool-use agent training environments with diverse task configurations. To this end, we introduce **CodeGym**, a scalable framework that synthesizes diverse, verifiable, and controllable multi-turn tool-use environments for agent RL, enabling LLM agents to explore and master various workflows actively. CodeGym converts static coding problems into interactive environments by extracting atomic functions or logic into callable tools, yielding verifiable tasks that span various tool-execution workflows. Models of varying sizes and chain-of-thought configurations trained in CodeGym exhibit consistent out-of-distribution generalizability; for example, Qwen2.5-32B-Instruct achieves an absolute accuracy gain of 8.7 points on the OOD benchmark $\tau$-Bench. These results highlight CodeGym as a step toward scalable general-purpose RL environments for training tool-use behaviors that align with real-world agent workflows. Our code is publicly available at https://github.com/StigLidu/CodeGym.

Cite

Text

Du et al. "Generalizable End-to-End Tool-Use RL with Synthetic CodeGym." International Conference on Learning Representations, 2026.

Markdown

[Du et al. "Generalizable End-to-End Tool-Use RL with Synthetic CodeGym." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/du2026iclr-generalizable/)

BibTeX

@inproceedings{du2026iclr-generalizable,
  title     = {{Generalizable End-to-End Tool-Use RL with Synthetic CodeGym}},
  author    = {Du, Weihua and Gong, Hailei and Ling, Zhan and Liu, Kang and Shen, Lingfeng and Yao, Xuesong and Xu, Yufei and Shi, Dingyuan and Yang, Yiming and Chen, Jiecao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/du2026iclr-generalizable/}
}