Transferable Perturbations of Deep Feature Distributions

Inkawhich, Nathan; Liang, Kevin J; Carin, Lawrence; Chen, Yiran

Transferable Perturbations of Deep Feature Distributions

Nathan Inkawhich, Kevin J Liang, Lawrence Carin, Yiran Chen

ICLR 2020

/iclr/2020/inkawhich2020iclr-transferable/

Abstract

Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority on explainability and interpretability of the attacking process. Our methodology affords an analysis of how adversarial attacks change the intermediate feature distributions of CNNs, as well as a measure of layer-wise and class-wise feature distributional separability/entanglement. We also conceptualize a transition from task/data-specific to model-specific features within a CNN architecture that directly impacts the transferability of adversarial examples.

PDF ICLR Semantic Scholar

Cite

Text

Inkawhich et al. "Transferable Perturbations of Deep Feature Distributions." International Conference on Learning Representations, 2020.

Markdown

[Inkawhich et al. "Transferable Perturbations of Deep Feature Distributions." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/inkawhich2020iclr-transferable/)

BibTeX

@inproceedings{inkawhich2020iclr-transferable,
  title     = {{Transferable Perturbations of Deep Feature Distributions}},
  author    = {Inkawhich, Nathan and Liang, Kevin J and Carin, Lawrence and Chen, Yiran},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/inkawhich2020iclr-transferable/}
}