Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System

Collado, Julian; Stangl, Kevin

Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System

NeurIPSW 2024

/neuripsw/2024/collado2024neuripsw-keep-a/

Abstract

Recent approaches in machine learning often solve a task using a composition of multiple models or agentic architectures. When targeting a composed system with adversarial attacks, it might not be computationally or informationally feasible to train an end-to-end proxy model or a proxy model for every component of the system. We introduce a method to craft an adversarial attack against the overall multi-model system when we only have a proxy model for the final black-box model, and when the transformation applied by the initial models can make the adversarial perturbations ineffective. Current methods handle this by applying many copies of the first model/transformation to an input and then re-use a standard adversarial attack by averaging gradients, or learning a proxy model for both stages. To our knowledge, this is the first attack specifically designed for this threat model and our method has a substantially higher attack success rate (80\% vs 25\%) and contains 9.4\% smaller perturbations (MSE) compared to prior state-of-the-art methods. Our experiments focus on a supervised image pipeline, but we are confident the attack will generalize to other multi-model settings [e.g. a mix of open/closed source foundation models], or agentic systems

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Collado and Stangl. "Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System." NeurIPS 2024 Workshops: Red_Teaming_GenAI, 2024.

Markdown

[Collado and Stangl. "Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System." NeurIPS 2024 Workshops: Red_Teaming_GenAI, 2024.](https://mlanthology.org/neuripsw/2024/collado2024neuripsw-keep-a/)

BibTeX

@inproceedings{collado2024neuripsw-keep-a,
  title     = {{Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System}},
  author    = {Collado, Julian and Stangl, Kevin},
  booktitle = {NeurIPS 2024 Workshops: Red_Teaming_GenAI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/collado2024neuripsw-keep-a/}
}