Robustness Guarantees for Adversarially Trained Neural Networks
Abstract
We study robust adversarial training of two-layer neural networks as a bi-level optimization problem. In particular, for the inner loop that implements the adversarial attack during training using projected gradient descent (PGD), we propose maximizing a \emph{lower bound} on the $0/1$-loss by reflecting a surrogate loss about the origin. This allows us to give a convergence guarantee for the inner-loop PGD attack. Furthermore, assuming the data is linearly separable, we provide precise iteration complexity results for end-to-end adversarial training, which holds for any width and initialization. We provide empirical evidence to support our theoretical results.
Cite
Text
Mianjy and Arora. "Robustness Guarantees for Adversarially Trained Neural Networks." Neural Information Processing Systems, 2023.Markdown
[Mianjy and Arora. "Robustness Guarantees for Adversarially Trained Neural Networks." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/mianjy2023neurips-robustness/)BibTeX
@inproceedings{mianjy2023neurips-robustness,
title = {{Robustness Guarantees for Adversarially Trained Neural Networks}},
author = {Mianjy, Poorya and Arora, Raman},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/mianjy2023neurips-robustness/}
}