Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness
Abstract
This paper provides a unified view to explain different adversarial attacks and defense methods, i.e. the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing robustness-boosting methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features. Our code is available online at https://github.com/Jie-Ren/A-Unified-Game-Theoretic-Interpretation-of-Adversarial-Robustness.
Cite
Text
Ren et al. "Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness." Neural Information Processing Systems, 2021.Markdown
[Ren et al. "Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/ren2021neurips-unified/)BibTeX
@inproceedings{ren2021neurips-unified,
title = {{Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness}},
author = {Ren, Jie and Zhang, Die and Wang, Yisen and Chen, Lu and Zhou, Zhanpeng and Chen, Yiting and Cheng, Xu and Wang, Xin and Zhou, Meng and Shi, Jie and Zhang, Quanshi},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/ren2021neurips-unified/}
}