Backdoors Stuck at the Frontdoor: Multi-Agent Backdoor Attacks That Backfire

Abstract

Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.

Cite

Text

Datta and Shadbolt. "Backdoors Stuck at the Frontdoor: Multi-Agent Backdoor Attacks That Backfire." ICLR 2022 Workshops: GMS, 2022.

Markdown

[Datta and Shadbolt. "Backdoors Stuck at the Frontdoor: Multi-Agent Backdoor Attacks That Backfire." ICLR 2022 Workshops: GMS, 2022.](https://mlanthology.org/iclrw/2022/datta2022iclrw-backdoors/)

BibTeX

@inproceedings{datta2022iclrw-backdoors,
  title     = {{Backdoors Stuck at the Frontdoor: Multi-Agent Backdoor Attacks That Backfire}},
  author    = {Datta, Siddhartha and Shadbolt, Nigel},
  booktitle = {ICLR 2022 Workshops: GMS},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/datta2022iclrw-backdoors/}
}