AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses

Abstract

We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. Unlike existing security benchmarks that often serve as proxies for real-world tasks, AutoAdvExBench directly measures LLMs’ success on tasks regularly performed by machine learning security experts. This approach offers a significant advantage: if a LLM could solve the challenges presented in AutoAdvExBench, it would immediately present practical utility for adversarial machine learning researchers. While our strongest ensemble of agents can break 87% of CTF-like ("homework exercise") adversarial example defenses, they break just 37% of real-world defenses, indicating a large gap between difficulty in attacking "real" code, and CTF-like code. Moreover, LLMs that are good at CTFs are not always good at real-world defenses; for example, Claude Sonnet 3.5 has a nearly identical attack success rate to Opus 4 on the CTF-like defenses (75% vs 79%), but the on the real-world defenses Sonnet 3.5 breaks just 13% of defenses compared to Opus 4’s 30%. We make this benchmark available at https://github.com/ethz-spylab/AutoAdvExBench.

Cite

Text

Carlini et al. "AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Carlini et al. "AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/carlini2025icml-autoadvexbench/)

BibTeX

@inproceedings{carlini2025icml-autoadvexbench,
  title     = {{AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses}},
  author    = {Carlini, Nicholas and Debenedetti, Edoardo and Rando, Javier and Nasr, Milad and Tramèr, Florian},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {6778-6793},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/carlini2025icml-autoadvexbench/}
}