The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models

Li Wang, Zhiguo Fu, Yingcong Zhou, Zili Yan

AAAI 2023 pp. 10149-10156

doi:10.1609/AAAI.V37I8.26209 /aaai/2023/wang2023aaai-implicit/

Abstract

The study of the implicit regularization induced by gradient-based optimization in deep learning is a long-standing pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) in the continuous-time view, so-called momentum gradient flow (MGF). We show that the components of weight vector are learned for a deep linear neural networks at different evolution rates, and this evolution gap increases with the depth. Firstly, we show that if the depth equals one, the evolution gap between the weight vector components is linear, which is consistent with the performance of ridge. In particular, we establish a tight coupling between MGF and ridge for the least squares regression. In detail, we show that when the regularization parameter of ridge is inversely proportional to the square of the time parameter of MGF, the risk of MGF is no more than 1.54 times that of ridge, and their relative Bayesian risks are almost indistinguishable. Secondly, if the model becomes deeper, i.e. the depth is greater than or equal to 2, the evolution gap becomes more significant, which implies an implicit bias towards sparse solutions. The numerical experiments strongly support our theoretical results.

PDF AAAI Semantic Scholar

Cite

Text

Wang et al. "The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I8.26209

Markdown

[Wang et al. "The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/wang2023aaai-implicit/) doi:10.1609/AAAI.V37I8.26209

BibTeX

@inproceedings{wang2023aaai-implicit,
  title     = {{The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models}},
  author    = {Wang, Li and Fu, Zhiguo and Zhou, Yingcong and Yan, Zili},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {10149-10156},
  doi       = {10.1609/AAAI.V37I8.26209},
  url       = {https://mlanthology.org/aaai/2023/wang2023aaai-implicit/}
}