SWE-Bench Goes Live!
Abstract
The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a key benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench has become the dominant benchmark in this domain, it suffers from several limitations: it has not been updated since its release, is restricted to only 12 repositories, and relies heavily on manual effort for constructing test instances and setting up executable environments, significantly limiting its scalability. We present SWE-bench-Live, a live-updatable benchmark designed to address these limitations. SWE-bench-Live currently includes 1,890 tasks derived from real GitHub issues created since 2024, spanning 223 repositories. Each task is accompanied by a dedicated Docker image to ensure reproducible execution. Additionally, we introduce an automated curation pipeline that streamlines the entire process from instance creation to environment setup, removing manual bottlenecks and enabling scalability and continuous updates. We evaluate a range of state-of-the-art models and agent frameworks on SWE-bench-Live, offering detailed empirical insights into their real-world bug-fixing capabilities. By providing a fresh, diverse, and executable benchmark grounded in live repository activity, SWE-bench-Live supports reliable, large-scale assessment of code LLMs and code agents in realistic development settings.
Cite
Text
Zhang et al. "SWE-Bench Goes Live!." Advances in Neural Information Processing Systems, 2025.Markdown
[Zhang et al. "SWE-Bench Goes Live!." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-swebench/)BibTeX
@inproceedings{zhang2025neurips-swebench,
title = {{SWE-Bench Goes Live!}},
author = {Zhang, Linghao and He, Shilin and Zhang, Chaoyun and Kang, Yu and Li, Bowen and Xie, Chengxing and Wang, Junhao and Wang, Maoquan and Huang, Yufan and Fu, Shengyu and Nallipogu, Elsie and Lin, Qingwei and Dang, Yingnong and Rajmohan, Saravan and Zhang, Dongmei},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/zhang2025neurips-swebench/}
}