Denoised Smoothing: A Provable Defense for Pretrained Classifiers

Abstract

We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at: https://github.com/microsoft/denoised-smoothing.

Cite

Text

Salman et al. "Denoised Smoothing: A Provable Defense for Pretrained Classifiers." Neural Information Processing Systems, 2020.

Markdown

[Salman et al. "Denoised Smoothing: A Provable Defense for Pretrained Classifiers." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/salman2020neurips-denoised/)

BibTeX

@inproceedings{salman2020neurips-denoised,
  title     = {{Denoised Smoothing: A Provable Defense for Pretrained Classifiers}},
  author    = {Salman, Hadi and Sun, Mingjie and Yang, Greg and Kapoor, Ashish and Kolter, J. Zico},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/salman2020neurips-denoised/}
}