Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Abstract

We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$), which arise in the context of training depth-$k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$, the number of iterations required for convergence scales exponentially with the depth $k$. We also show empirically that this phenomenon can occur in higher dimensions, where each $w_i$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.

Cite

Text

Shamir. "Exponential Convergence Time of Gradient Descent for    One-Dimensional Deep Linear Neural Networks." Conference on Learning Theory, 2019.

Markdown

[Shamir. "Exponential Convergence Time of Gradient Descent for    One-Dimensional Deep Linear Neural Networks." Conference on Learning Theory, 2019.](https://mlanthology.org/colt/2019/shamir2019colt-exponential/)

BibTeX

@inproceedings{shamir2019colt-exponential,
  title     = {{Exponential Convergence Time of Gradient Descent for    One-Dimensional Deep Linear Neural Networks}},
  author    = {Shamir, Ohad},
  booktitle = {Conference on Learning Theory},
  year      = {2019},
  pages     = {2691-2713},
  volume    = {99},
  url       = {https://mlanthology.org/colt/2019/shamir2019colt-exponential/}
}