Defending Against Whitebox Adversarial Attacks via Randomized Discretization
Abstract
Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) $L_{\infty}$-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.
Cite
Text
Zhang and Liang. "Defending Against Whitebox Adversarial Attacks via Randomized Discretization." Artificial Intelligence and Statistics, 2019.Markdown
[Zhang and Liang. "Defending Against Whitebox Adversarial Attacks via Randomized Discretization." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/zhang2019aistats-defending/)BibTeX
@inproceedings{zhang2019aistats-defending,
title = {{Defending Against Whitebox Adversarial Attacks via Randomized Discretization}},
author = {Zhang, Yuchen and Liang, Percy},
booktitle = {Artificial Intelligence and Statistics},
year = {2019},
pages = {684-693},
volume = {89},
url = {https://mlanthology.org/aistats/2019/zhang2019aistats-defending/}
}