Robustness Guarantees for Adversarially Trained Neural Networks

Abstract

We study robust adversarial training of two-layer neural networks as a bi-level optimization problem. In particular, for the inner loop that implements the adversarial attack during training using projected gradient descent (PGD), we propose maximizing a \emph{lower bound} on the $0/1$-loss by reflecting a surrogate loss about the origin. This allows us to give a convergence guarantee for the inner-loop PGD attack. Furthermore, assuming the data is linearly separable, we provide precise iteration complexity results for end-to-end adversarial training, which holds for any width and initialization. We provide empirical evidence to support our theoretical results.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Mianjy and Arora. "Robustness Guarantees for Adversarially Trained Neural Networks." Neural Information Processing Systems, 2023.

Markdown

[Mianjy and Arora. "Robustness Guarantees for Adversarially Trained Neural Networks." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/mianjy2023neurips-robustness/)

BibTeX

@inproceedings{mianjy2023neurips-robustness,
  title     = {{Robustness Guarantees for Adversarially Trained Neural Networks}},
  author    = {Mianjy, Poorya and Arora, Raman},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/mianjy2023neurips-robustness/}
}