AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
Abstract
We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. Unlike existing security benchmarks that often serve as proxies for real-world tasks, AutoAdvExBench directly measures LLMs’ success on tasks regularly performed by machine learning security experts. This approach offers a significant advantage: if a LLM could solve the challenges presented in AutoAdvExBench, it would immediately present practical utility for adversarial machine learning researchers. While our strongest ensemble of agents can break 87% of CTF-like ("homework exercise") adversarial example defenses, they break just 37% of real-world defenses, indicating a large gap between difficulty in attacking "real" code, and CTF-like code. Moreover, LLMs that are good at CTFs are not always good at real-world defenses; for example, Claude Sonnet 3.5 has a nearly identical attack success rate to Opus 4 on the CTF-like defenses (75% vs 79%), but the on the real-world defenses Sonnet 3.5 breaks just 13% of defenses compared to Opus 4’s 30%. We make this benchmark available at https://github.com/ethz-spylab/AutoAdvExBench.
Cite
Text
Carlini et al. "AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Carlini et al. "AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/carlini2025icml-autoadvexbench/)BibTeX
@inproceedings{carlini2025icml-autoadvexbench,
title = {{AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses}},
author = {Carlini, Nicholas and Debenedetti, Edoardo and Rando, Javier and Nasr, Milad and Tramèr, Florian},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {6778-6793},
volume = {267},
url = {https://mlanthology.org/icml/2025/carlini2025icml-autoadvexbench/}
}