Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Abstract

AI agents are rapidly being deployed across diverse industries, but can they adhere to deployment policies under attacks? We organized a one-month red teaming challenge---the largest of its kind to date---involving expert red teamers attempting to elicit policy violations from AI agents powered by $22$ frontier LLMs. Our challenge collected $1.8$ million prompt injection attacks, resulting in over $60,000$ documented successful policy violations, revealing critical vulnerabilities. Utilizing this extensive data, we construct a challenging AI agent red teaming benchmark, currently achieving near $100\%$ attack success rates across all tested agents and associated policies. Our further analysis reveals high transferability and universality of successful attacks, underscoring the scale and criticality of existing AI agent vulnerabilities. We also observe minimal correlation between agent robustness and factors such as model capability, size, or inference compute budget, highlighting the necessity of substantial improvements in defense. We hope our benchmark and insights drive further research toward more secure and reliable AI agents.

Cite

Text

Zou et al. "Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zou et al. "Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zou2025neurips-security/)

BibTeX

@inproceedings{zou2025neurips-security,
  title     = {{Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition}},
  author    = {Zou, Andy and Lin, Maxwell and Jones, Eliot Krzysztof and Nowak, Micha V. and Dziemian, Mateusz and Winter, Nick and Nathanael, Valent and Croft, Ayla and Davies, Xander and Patel, Jai and Kirk, Robert and Gal, Yarin and Hendrycks, Dan and Kolter, J Zico and Fredrikson, Matt},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zou2025neurips-security/}
}