On Non-Local Convergence Analysis of Deep Linear Networks

Abstract

In this paper, we study the non-local convergence properties of deep linear networks. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least a layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of trajectories that converge to the global minimizers by stages. We conclude that the rates vary from polynomial to linear. As far as we know, our results are the first to give a non-local analysis of deep linear neural networks with arbitrary balanced initialization, rather than the lazy training regime which has dominated the literature on neural networks or the restricted benign initialization.

Cite

Text

Chen et al. "On Non-Local Convergence Analysis of Deep Linear Networks." International Conference on Machine Learning, 2022.

Markdown

[Chen et al. "On Non-Local Convergence Analysis of Deep Linear Networks." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/chen2022icml-nonlocal/)

BibTeX

@inproceedings{chen2022icml-nonlocal,
  title     = {{On Non-Local Convergence Analysis of Deep Linear Networks}},
  author    = {Chen, Kun and Lin, Dachao and Zhang, Zhihua},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {3417-3443},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/chen2022icml-nonlocal/}
}