Detecting Adversarial Examples Is (Nearly) as Hard as Classifying Them

ICML 2022 pp. 21692-21702

/icml/2022/tramer2022icml-detecting/

Abstract

Making classifiers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we show how to build a similarly robust (but inefficient) classifier for attacks at distance $\epsilon/2$. Our reduction is computationally inefficient, but preserves the data complexity of the original detector. The reduction thus cannot be directly used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efficient classifier). To illustrate, we revisit $14$ empirical detector defenses published over the past years. For $12/14$ defenses, we show that the claimed detection results imply an inefficient classifier with robustness far beyond the state-of-the-art— thus casting some doubts on the results’ validity. Finally, we show that our reduction applies in both directions: a robust classifier for attacks at distance $\epsilon/2$ implies an inefficient robust detector at distance $\epsilon$. Thus, we argue that robust classification and robust detection should be regarded as (near)-equivalent problems, if we disregard their computational complexity.

PDF ICML Semantic Scholar

Cite

Text

Tramer. "Detecting Adversarial Examples Is (Nearly) as Hard as Classifying Them." International Conference on Machine Learning, 2022.

Markdown

[Tramer. "Detecting Adversarial Examples Is (Nearly) as Hard as Classifying Them." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/tramer2022icml-detecting/)

BibTeX

@inproceedings{tramer2022icml-detecting,
  title     = {{Detecting Adversarial Examples Is (Nearly) as Hard as Classifying Them}},
  author    = {Tramer, Florian},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {21692-21702},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/tramer2022icml-detecting/}
}