Combating Adversaries with Anti-Adversaries

Abstract

Deep neural networks are vulnerable to small input perturbations known as adversarial attacks. Inspired by the fact that these adversaries are constructed by iteratively minimizing the confidence of a network for the true class label, we propose the anti-adversary layer, aimed at countering this effect. In particular, our layer generates an input perturbation in the opposite direction of the adversarial one and feeds the classifier a perturbed version of the input. Our approach is training-free and theoretically supported. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models, and conduct large-scale experiments from black-box to adaptive attacks on CIFAR10, CIFAR100 and ImageNet. Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.

Cite

Text

Alfarra et al. "Combating Adversaries with Anti-Adversaries." ICML 2021 Workshops: AML, 2021.

Markdown

[Alfarra et al. "Combating Adversaries with Anti-Adversaries." ICML 2021 Workshops: AML, 2021.](https://mlanthology.org/icmlw/2021/alfarra2021icmlw-combating/)

BibTeX

@inproceedings{alfarra2021icmlw-combating,
  title     = {{Combating Adversaries with Anti-Adversaries}},
  author    = {Alfarra, Motasem and Perez, Juan Camilo and Thabet, Ali and Bibi, Adel and Torr, Philip and Ghanem, Bernard},
  booktitle = {ICML 2021 Workshops: AML},
  year      = {2021},
  url       = {https://mlanthology.org/icmlw/2021/alfarra2021icmlw-combating/}
}