Training Software Engineering Agents and Verifiers with SWE-Gym
Abstract
We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents , achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also experiment with inference-time scaling through verifiers trained on agent trajectories sampled from SWE-Gym. When combined with our fine-tuned SWE agents, we achieve 32.0% and 26.0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents. To facilitate further research, we publicly release SWE-Gym, models, and agent trajectories.
Cite
Text
Pan et al. "Training Software Engineering Agents and Verifiers with SWE-Gym." ICLR 2025 Workshops: DL4C, 2025.Markdown
[Pan et al. "Training Software Engineering Agents and Verifiers with SWE-Gym." ICLR 2025 Workshops: DL4C, 2025.](https://mlanthology.org/iclrw/2025/pan2025iclrw-training/)BibTeX
@inproceedings{pan2025iclrw-training,
title = {{Training Software Engineering Agents and Verifiers with SWE-Gym}},
author = {Pan, Jiayi and Wang, Xingyao and Neubig, Graham and Jaitly, Navdeep and Ji, Heng and Suhr, Alane and Zhang, Yizhe},
booktitle = {ICLR 2025 Workshops: DL4C},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/pan2025iclrw-training/}
}