Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data

Abstract

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak–Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Dana et al. "Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data." Advances in Neural Information Processing Systems, 2025.

Markdown

[Dana et al. "Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/dana2025neurips-convergence/)

BibTeX

@inproceedings{dana2025neurips-convergence,
  title     = {{Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data}},
  author    = {Dana, Léo and Pillaud-Vivien, Loucas and Bach, Francis},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/dana2025neurips-convergence/}
}