A Unified Approach to Interpreting and Boosting Adversarial Transferability
Abstract
In this paper, we use the interaction inside adversarial perturbations to explain and boost the adversarial transferability. We discover and prove the negative correlation between the adversarial transferability and the interaction inside adversarial perturbations. The negative correlation is further verified through different DNNs with various inputs. Moreover, this negative correlation can be regarded as a unified perspective to understand current transferability-boosting methods. To this end, we prove that some classic methods of enhancing the transferability essentially decease interactions inside adversarial perturbations. Based on this, we propose to directly penalize interactions during the attacking process, which significantly improves the adversarial transferability. We will release the code when the paper is accepted.
Cite
Text
Wang et al. "A Unified Approach to Interpreting and Boosting Adversarial Transferability." International Conference on Learning Representations, 2021.Markdown
[Wang et al. "A Unified Approach to Interpreting and Boosting Adversarial Transferability." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/wang2021iclr-unified/)BibTeX
@inproceedings{wang2021iclr-unified,
title = {{A Unified Approach to Interpreting and Boosting Adversarial Transferability}},
author = {Wang, Xin and Ren, Jie and Lin, Shuyun and Zhu, Xiangming and Wang, Yisen and Zhang, Quanshi},
booktitle = {International Conference on Learning Representations},
year = {2021},
url = {https://mlanthology.org/iclr/2021/wang2021iclr-unified/}
}