AI Agents with Formal Security Guarantees

Abstract

We propose a novel system that enables secure and controllable AI agents by enhancing them with a formal security analyzer. In contrast to prior work, our system does not try to detect prompt injections on a best-effort basis, but instead imposes hard constraints on the agent actions thereby preventing the effects of the injection. The constraints can be specified in a novel and flexible domain specific language for security rules. Before the agent takes action, the analyzer checks the current agent state for violations of any of the provided policy rules and raises an error if the proposed action is not allowed. When the analyzer determines an action to be safe, it does so using formal guarantee that none of the rules specified in the policy are violated. We show that our analyzer is effective, and detects and prevents security vulnerabilities in real-world agents.

Cite

Text

Balunovic et al. "AI Agents with Formal Security Guarantees." ICML 2024 Workshops: NextGenAISafety, 2024.

Markdown

[Balunovic et al. "AI Agents with Formal Security Guarantees." ICML 2024 Workshops: NextGenAISafety, 2024.](https://mlanthology.org/icmlw/2024/balunovic2024icmlw-ai/)

BibTeX

@inproceedings{balunovic2024icmlw-ai,
  title     = {{AI Agents with Formal Security Guarantees}},
  author    = {Balunovic, Mislav and Beurer-Kellner, Luca and Fischer, Marc and Vechev, Martin},
  booktitle = {ICML 2024 Workshops: NextGenAISafety},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/balunovic2024icmlw-ai/}
}