SWINGARENA: Adversarial Programming Arena for Long-Context GitHub Issue Solving

Xu, Wendong; Xiong, Jing; Zhao, Chenyang; Chen, Qiujiang; Wang, Haoran; Shen, Hui; Wan, Zhongwei; Dai, Jianbo; Wu, Taiqiang; Xiao, He; Tao, Chaofan; Mao, Zhuoqing; Sheng, Ying; Guo, Zhijiang; Yang, Hongxia; Yu, Bei; Kong, Lingpeng; Gu, Quanquan; Wong, Ngai

SWINGARENA: Adversarial Programming Arena for Long-Context GitHub Issue Solving

Wendong Xu, Jing Xiong, Chenyang Zhao, Qiujiang Chen, Haoran Wang, Hui Shen, Zhongwei Wan, Jianbo Dai, Taiqiang Wu, He Xiao, Chaofan Tao, Zhuoqing Mao, Ying Sheng, Zhijiang Guo, Hongxia Yang, Bei Yu, Lingpeng Kong, Quanquan Gu, Ngai Wong

ICLR 2026

/iclr/2026/xu2026iclr-swingarena/

Abstract

We present \textsc{SwingArena}, a adversarial evaluation framework for Large Language Models (LLMs) that closely mirrors real-world software development workflows. Unlike traditional static benchmarks, \textsc{SwingArena} models the collaborative process of software iteration by pairing LLMs as \textit{submitters}, who generate patches, and \textit{reviewers}, who create test cases and verify the patches through continuous integration (CI) pipelines. To support these interactive evaluations, we introduce a retrieval-augmented code generation (RACG) module that efficiently handles long-context challenges by providing syntactically and semantically relevant code snippets from large codebases, supporting multiple programming languages (C++, Python, Rust, and Go). This enables the framework to scale across diverse tasks and contexts while respecting token limitations. Our experiments, using over 400 high-quality real-world GitHub issues selected from a pool of 2,300 issues, show that models like GPT-4o excel at aggressive patch generation, whereas DeepSeek and Gemini prioritize correctness in CI validation. \textsc{SwingArena} presents a scalable and extensible methodology for evaluating LLMs in realistic, CI-driven software development settings.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Xu et al. "SWINGARENA: Adversarial Programming Arena for Long-Context GitHub Issue Solving." International Conference on Learning Representations, 2026.

Markdown

[Xu et al. "SWINGARENA: Adversarial Programming Arena for Long-Context GitHub Issue Solving." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xu2026iclr-swingarena/)

BibTeX

@inproceedings{xu2026iclr-swingarena,
  title     = {{SWINGARENA: Adversarial Programming Arena for Long-Context GitHub Issue Solving}},
  author    = {Xu, Wendong and Xiong, Jing and Zhao, Chenyang and Chen, Qiujiang and Wang, Haoran and Shen, Hui and Wan, Zhongwei and Dai, Jianbo and Wu, Taiqiang and Xiao, He and Tao, Chaofan and Mao, Zhuoqing and Sheng, Ying and Guo, Zhijiang and Yang, Hongxia and Yu, Bei and Kong, Lingpeng and Gu, Quanquan and Wong, Ngai},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/xu2026iclr-swingarena/}
}