Convergence Analysis and Trajectory Comparison of Gradient Descent for Overparameterized Deep Linear Networks

Abstract

This paper presents a convergence analysis and trajectory comparison of the gradient descent (GD) method for overparameterized deep linear neural networks with different random initializations, demonstrating that the GD trajectory for these networks closely matches that of the corresponding convex optimization problem. This study touches upon one major open theoretical problem in machine learning--why deep neural networks trained with GD methods are efficient in many practical applications? While the solution of this problem is still beyond the reach of general nonlinear deep neural networks, extensive efforts have been invested in studying relevant questions for deep linear neural networks, and many interesting results have been reported to date. For example, recent results on loss landscape show that even though the loss function of deep linear neural networks is non-convex, every local minimizer is also a global minimizer. We focus on the trajectory of GD when applied to deep linear networks and demonstrate that, with appropriate initialization and sufficient width of the hidden layers, the GD trajectory closely matches that of the corresponding convex optimization problem. This result holds regardless of the depth of the network, providing insight into the efficiency of GD in the training of deep neural networks. Furthermore, we show that the GD trajectory for an overparameterized deep linear network automatically avoids bad saddle points.

Cite

Text

Zhao and Xu. "Convergence Analysis and Trajectory Comparison of Gradient Descent for Overparameterized Deep Linear Networks." Transactions on Machine Learning Research, 2024.

Markdown

[Zhao and Xu. "Convergence Analysis and Trajectory Comparison of Gradient Descent for Overparameterized Deep Linear Networks." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/zhao2024tmlr-convergence/)

BibTeX

@article{zhao2024tmlr-convergence,
  title     = {{Convergence Analysis and Trajectory Comparison of Gradient Descent for Overparameterized Deep Linear Networks}},
  author    = {Zhao, Hongru and Xu, Jinchao},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/zhao2024tmlr-convergence/}
}