Transferable Adversarial Perturbations
Abstract
State-of-the-art deep neural network classifiers are highly vulnerable to adversarial examples which are designed to mislead classifiers with a very small perturbation. However, the performance of black-box attacks (without knowledge of the model parameters) against deployed models always degrades significantly. In this paper, We propose a novel way of perturbations for adversarial examples to enable black-box transfer. We first show that maximizing distance between natural images and their adversarial examples in the intermediate feature maps can improve both white-box attacks (with knowledge of the model parameters) and black-box attacks. We also show that smooth regularization on adversarial perturbations enables transferring across models. Extensive experimental results show that our approach outperforms state-of-the-art methods both in white-box and black-box attacks.
Cite
Text
Zhou et al. "Transferable Adversarial Perturbations." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01264-9_28Markdown
[Zhou et al. "Transferable Adversarial Perturbations." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/zhou2018eccv-transferable/) doi:10.1007/978-3-030-01264-9_28BibTeX
@inproceedings{zhou2018eccv-transferable,
title = {{Transferable Adversarial Perturbations}},
author = {Zhou, Wen and Hou, Xin and Chen, Yongjun and Tang, Mengyun and Huang, Xiangqi and Gan, Xiang and Yang, Yong},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2018},
doi = {10.1007/978-3-030-01264-9_28},
url = {https://mlanthology.org/eccv/2018/zhou2018eccv-transferable/}
}