Feature Space Perturbations Yield More Transferable Adversarial Examples

Abstract

Many recent works have shown that deep learning models are vulnerable to quasi-imperceptible input perturbations, yet practitioners cannot fully explain this behavior. This work describes a transfer-based blackbox targeted adversarial attack of deep feature space representations that also provides insights into cross-model class representations of deep CNNs. The attack is explicitly designed for transferability and drives feature space representation of a source image at layer L towards the representation of a target image at L. The attack yields highly transferable targeted examples, which outperform competition winning methods by over 30% in targeted attack metrics. We also show the choice of L to generate examples from is important, transferability characteristics are blackbox model agnostic, and indicate that well trained deep models have similar highly-abstract representations.

Cite

Text

Inkawhich et al. "Feature Space Perturbations Yield More Transferable Adversarial Examples." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00723

Markdown

[Inkawhich et al. "Feature Space Perturbations Yield More Transferable Adversarial Examples." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/inkawhich2019cvpr-feature/) doi:10.1109/CVPR.2019.00723

BibTeX

@inproceedings{inkawhich2019cvpr-feature,
  title     = {{Feature Space Perturbations Yield More Transferable Adversarial Examples}},
  author    = {Inkawhich, Nathan and Wen, Wei and Li, Hai and Chen, Yiran},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.00723},
  url       = {https://mlanthology.org/cvpr/2019/inkawhich2019cvpr-feature/}
}