ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Abstract

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce $\textit{ALE-Bench}$, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of consistency across problems and long-horizon problem-solving capabilities. This highlights the need for this benchmark to foster future AI advancements.

Cite

Text

Imajuku et al. "ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering." Advances in Neural Information Processing Systems, 2025.

Markdown

[Imajuku et al. "ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/imajuku2025neurips-alebench/)

BibTeX

@inproceedings{imajuku2025neurips-alebench,
  title     = {{ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering}},
  author    = {Imajuku, Yuki and Horie, Kohki and Iwata, Yoichi and Aoki, Kensho and Takahashi, Naohiro and Akiba, Takuya},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/imajuku2025neurips-alebench/}
}