Inequality Phenomenon in $l_{\infty}$-Adversarial Training, and Its Unrealized Threats
Abstract
The appearance of adversarial examples raises attention from both academia and industry. Along with the attack-defense arms race, adversarial training is the most effective against adversarial examples. However, we find inequality phenomena occur during the $l_{\infty}$-adversarial training, that few features dominate the prediction made by the adversarially trained model. We systematically evaluate such inequality phenomena by extensive experiments and find such phenomena become more obvious when performing adversarial training with increasing adversarial strength (evaluated by $\epsilon$). We hypothesize such inequality phenomena make $l_{\infty}$-adversarially trained model less reliable than the standard trained model when few ``important features" are influenced. To validate our hypothesis, we proposed two simple attacks that either perturb or replace important features with noise or occlusion. Experiments show that $l_{\infty}$-adversarially trained model can be easily attacked when the few important features are influenced. Our work shed light on the limitation of the practicality of $l_{\infty}$-adversarial training.
Cite
Text
Duan et al. "Inequality Phenomenon in $l_{\infty}$-Adversarial Training, and Its Unrealized Threats." International Conference on Learning Representations, 2023.Markdown
[Duan et al. "Inequality Phenomenon in $l_{\infty}$-Adversarial Training, and Its Unrealized Threats." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/duan2023iclr-inequality/)BibTeX
@inproceedings{duan2023iclr-inequality,
title = {{Inequality Phenomenon in $l_{\infty}$-Adversarial Training, and Its Unrealized Threats}},
author = {Duan, Ranjie and Chen, YueFeng and Zhu, Yao and Jia, Xiaojun and Zhang, Rong and Xue', Hui},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/duan2023iclr-inequality/}
}