WAT: Improve the Worst-Class Robustness in Adversarial Training

Abstract

Deep Neural Networks (DNN) have been shown to be vulnerable to adversarial examples. Adversarial training (AT) is a popular and effective strategy to defend against adversarial attacks. Recent works have shown that a robust model well-trained by AT exhibits a remarkable robustness disparity among classes, and propose various methods to obtain consistent robust accuracy across classes. Unfortunately, these methods sacrifice a good deal of the average robust accuracy. Accordingly, this paper proposes a novel framework of worst-class adversarial training and leverages no-regret dynamics to solve this problem. Our goal is to obtain a classifier with great performance on worst-class and sacrifice just a little average robust accuracy at the same time. We then rigorously analyze the theoretical properties of our proposed algorithm, and the generalization error bound in terms of the worst-class robust risk. Furthermore, we propose a measurement to evaluate the proposed method in terms of both the average and worst-class accuracies. Experiments on various datasets and networks show that our proposed method outperforms the state-of-the-art approaches.

Cite

Text

Li and Liu. "WAT: Improve the Worst-Class Robustness in Adversarial Training." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I12.26749

Markdown

[Li and Liu. "WAT: Improve the Worst-Class Robustness in Adversarial Training." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/li2023aaai-wat/) doi:10.1609/AAAI.V37I12.26749

BibTeX

@inproceedings{li2023aaai-wat,
  title     = {{WAT: Improve the Worst-Class Robustness in Adversarial Training}},
  author    = {Li, Boqi and Liu, Weiwei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {14982-14990},
  doi       = {10.1609/AAAI.V37I12.26749},
  url       = {https://mlanthology.org/aaai/2023/li2023aaai-wat/}
}