Transferable Perturbations of Deep Feature Distributions

Abstract

Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority on explainability and interpretability of the attacking process. Our methodology affords an analysis of how adversarial attacks change the intermediate feature distributions of CNNs, as well as a measure of layer-wise and class-wise feature distributional separability/entanglement. We also conceptualize a transition from task/data-specific to model-specific features within a CNN architecture that directly impacts the transferability of adversarial examples.

Cite

Text

Inkawhich et al. "Transferable Perturbations of Deep Feature Distributions." International Conference on Learning Representations, 2020.

Markdown

[Inkawhich et al. "Transferable Perturbations of Deep Feature Distributions." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/inkawhich2020iclr-transferable/)

BibTeX

@inproceedings{inkawhich2020iclr-transferable,
  title     = {{Transferable Perturbations of Deep Feature Distributions}},
  author    = {Inkawhich, Nathan and Liang, Kevin J and Carin, Lawrence and Chen, Yiran},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/inkawhich2020iclr-transferable/}
}