Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs

Abstract

Deep learning-based classifiers are known to be vulnerable to adversarial attacks. Existing methods for defending against such attacks require adding a defense mechanism or modifying the learning procedure (e.g., by adding adversarial examples). This paper shows that for certain data distributions one can learn a provably robust classifier using standard learning methods and without adding a defense mechanism. More specifically, this paper addresses the problem of finding a robust classifier for a binary classification problem in which the data comes from an isotropic mixture of Gaussians with orthonormal cluster centers. First, we characterize the largest $\ell_2$-attack any classifier can defend against while maintaining high accuracy, and show the existence of optimal robust classifiers achieving this maximum $\ell_2$-robustness. Next, we show that given data from the orthonormal Gaussian mixture model, gradient flow on a two-layer network with a polynomial ReLU activation and without adversarial examples provably finds an optimal robust classifier.

Cite

Text

Min and Vidal. "Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Min and Vidal. "Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/min2025icml-gradient/)

BibTeX

@inproceedings{min2025icml-gradient,
  title     = {{Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs}},
  author    = {Min, Hancheng and Vidal, Rene},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {44292-44350},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/min2025icml-gradient/}
}