Saddle-to-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

Abstract

When a deep ReLU network is initialized with small weights, gradient descent (GD) is at first dominated by the saddle at the origin in parameter space. We study the so-called escape directions along which GD leaves the origin, which play a similar role as the eigenvectors of the Hessian for strict saddles. We show that the optimal escape direction features a \textit{low-rank bias} in its deeper layers: the first singular value of the $\ell$-th layer weight matrix is at least $\ell^{\frac{1}{4}}$ larger than any other singular value. We also prove a number of related results about these escape directions. We suggest that deep ReLU networks exhibit saddle-to-saddle dynamics, with GD visiting a sequence of saddles with increasing bottleneck rank.

Cite

Text

Bantzis et al. "Saddle-to-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape." International Conference on Learning Representations, 2026.

Markdown

[Bantzis et al. "Saddle-to-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/bantzis2026iclr-saddletosaddle/)

BibTeX

@inproceedings{bantzis2026iclr-saddletosaddle,
  title     = {{Saddle-to-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape}},
  author    = {Bantzis, Ioannis and Simon, James B and Jacot, Arthur},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/bantzis2026iclr-saddletosaddle/}
}