Minimally Distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Abstract

The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient obfuscation.

Cite

Text

Croce and Hein. "Minimally Distorted Adversarial Examples with a Fast Adaptive Boundary Attack." International Conference on Machine Learning, 2020.

Markdown

[Croce and Hein. "Minimally Distorted Adversarial Examples with a Fast Adaptive Boundary Attack." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/croce2020icml-minimally/)

BibTeX

@inproceedings{croce2020icml-minimally,
  title     = {{Minimally Distorted Adversarial Examples with a Fast Adaptive Boundary Attack}},
  author    = {Croce, Francesco and Hein, Matthias},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {2196-2205},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/croce2020icml-minimally/}
}