Cowpox: Towards the Immunity of VLM-Based Multi-Agent Systems

Abstract

Vision Language Model (VLM) Agents are stateful, autonomous entities capable of perceiving and interacting with their environments through vision and language. Multi-agent systems comprise specialized agents who collaborate to solve a (complex) task. A core security property is robustness, stating that the system maintains its integrity during adversarial attacks. Multi-agent systems lack robustness, as a successful exploit against one agent can spread and infect other agents to undermine the entire system’s integrity. We propose a defense Cowpox to provably enhance the robustness of a multi-agent system by a distributed mechanism that improves the recovery rate of agents by limiting the expected number of infections to other agents. The core idea is to generate and distribute a special cure sample that immunizes an agent against the attack before exposure. We demonstrate the effectiveness of Cowpox empirically and provide theoretical robustness guarantees.

Cite

Text

Wu et al. "Cowpox: Towards the Immunity of VLM-Based Multi-Agent Systems." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Wu et al. "Cowpox: Towards the Immunity of VLM-Based Multi-Agent Systems." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wu2025icml-cowpox/)

BibTeX

@inproceedings{wu2025icml-cowpox,
  title     = {{Cowpox: Towards the Immunity of VLM-Based Multi-Agent Systems}},
  author    = {Wu, Yutong and Zhang, Jie and Li, Yiming and Zhang, Chao and Guo, Qing and Qiu, Han and Lukas, Nils and Zhang, Tianwei},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {68015-68035},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/wu2025icml-cowpox/}
}