Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data

Abstract

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak–Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$.

Cite

Text

Dana et al. "Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data." Advances in Neural Information Processing Systems, 2025.

Markdown

[Dana et al. "Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/dana2025neurips-convergence/)

BibTeX

@inproceedings{dana2025neurips-convergence,
  title     = {{Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data}},
  author    = {Dana, Léo and Pillaud-Vivien, Loucas and Bach, Francis},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/dana2025neurips-convergence/}
}