InnoGym: Benchmarking the Innovation Potential of AI Agents

Abstract

LLMs and Agents have achieved impressive progress in code generation, mathematical reasoning, and scientific discovery. However, existing benchmarks primarily measure correctness, overlooking the diversity of methods behind solutions. True innovation depends not only on producing correct answers but also on the originality of the approach. We present \textbf{InnoGym}, the first benchmark and framework designed to systematically evaluate the innovation potential of AI agents. InnoGym introduces two complementary metrics: performance gain, which measures improvement over the best-known solutions, and novelty, which captures methodological differences from prior approaches. The benchmark includes 18 carefully curated tasks from real-world engineering and scientific domains, each standardized through resource filtering, evaluator validation, and solution collection. In addition, we provide \textbf{iGym}, a unified execution environment for reproducible and long-horizon evaluations. Extensive experiments show that while some agents produce novel approaches, their lack of robustness limits performance gains. These results highlight a key gap between creativity and effectiveness, underscoring the need for benchmarks that evaluate both.

Cite

Text

Zhang et al. "InnoGym: Benchmarking the Innovation Potential of AI Agents." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "InnoGym: Benchmarking the Innovation Potential of AI Agents." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-innogym/)

BibTeX

@inproceedings{zhang2026iclr-innogym,
  title     = {{InnoGym: Benchmarking the Innovation Potential of AI Agents}},
  author    = {Zhang, Jintian and Xu, Kewei and Zheng, Jingsheng and Yu, Zhuoyun and Zhu, Yuqi and Luo, Yujie and Wei, Lanning and Qiao, Shuofei and Du, Lun and Zheng, Da and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-innogym/}
}