Gradient Methods Provably Converge to Non-Robust Networks

Abstract

Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robust (susceptible to small adversarial $\ell_2$-perturbations), even when robust networks that classify the training dataset correctly exist.Perhaps surprisingly, we show that the well-known implicit bias towards margin maximization induces bias towards non-robust networks, by proving that every network which satisfies the KKT conditions of the max-margin problem is non-robust.

Cite

Text

Vardi et al. "Gradient Methods Provably Converge to Non-Robust Networks." Neural Information Processing Systems, 2022.

Markdown

[Vardi et al. "Gradient Methods Provably Converge to Non-Robust Networks." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/vardi2022neurips-gradient/)

BibTeX

@inproceedings{vardi2022neurips-gradient,
  title     = {{Gradient Methods Provably Converge to Non-Robust Networks}},
  author    = {Vardi, Gal and Yehudai, Gilad and Shamir, Ohad},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/vardi2022neurips-gradient/}
}