Training Software Engineering Agents and Verifiers with SWE-Gym

Abstract

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents , achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also experiment with inference-time scaling through verifiers trained on agent trajectories sampled from SWE-Gym. When combined with our fine-tuned SWE agents, we achieve 32.0% and 26.0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents. To facilitate further research, we publicly release SWE-Gym, models, and agent trajectories.

Cite

Text

Pan et al. "Training Software Engineering Agents and Verifiers with SWE-Gym." ICLR 2025 Workshops: DL4C, 2025.

Markdown

[Pan et al. "Training Software Engineering Agents and Verifiers with SWE-Gym." ICLR 2025 Workshops: DL4C, 2025.](https://mlanthology.org/iclrw/2025/pan2025iclrw-training/)

BibTeX

@inproceedings{pan2025iclrw-training,
  title     = {{Training Software Engineering Agents and Verifiers with SWE-Gym}},
  author    = {Pan, Jiayi and Wang, Xingyao and Neubig, Graham and Jaitly, Navdeep and Ji, Heng and Suhr, Alane and Zhang, Yizhe},
  booktitle = {ICLR 2025 Workshops: DL4C},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/pan2025iclrw-training/}
}