Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games

Abstract

Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.

Cite

Text

Balcan et al. "Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games." Artificial Intelligence and Statistics, 2023.

Markdown

[Balcan et al. "Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/balcan2023aistats-nash/)

BibTeX

@inproceedings{balcan2023aistats-nash,
  title     = {{Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games}},
  author    = {Balcan, Maria-Florina and Pukdee, Rattana and Ravikumar, Pradeep and Zhang, Hongyang},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {9607-9636},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/balcan2023aistats-nash/}
}