Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

Abstract

We consider a variant of the stochastic gradient descent (SGD) with a random learning rate and reveal its convergence properties. SGD is a widely used stochastic optimization algorithm in machine learning, especially deep learning. Numerous studies reveal the convergence properties of SGD and its simplified variants. Among these, the analysis of convergence using a stationary distribution of updated parameters provides generalizable results. However, to obtain a stationary distribution, the update direction of the parameters must not degenerate, which limits the applicable variants of SGD. In this study, we consider a novel SGD variant, Poisson SGD, which has degenerated parameter update directions and instead utilizes a random learning rate. Consequently, we demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions on a loss function. Based on this, we further show that Poisson SGD finds global minima in non-convex optimization problems and also evaluate the generalization error using this method. As a proof technique, we approximate the distribution by Poisson SGD with that of the bouncy particle sampler (BPS) and derive its stationary distribution, using the theoretical advance of the piece-wise deterministic Markov process (PDMP).

Cite

Text

Yoshida et al. "Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution." ICML 2024 Workshops: HiLD, 2024.

Markdown

[Yoshida et al. "Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/yoshida2024icmlw-effect/)

BibTeX

@inproceedings{yoshida2024icmlw-effect,
  title     = {{Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution}},
  author    = {Yoshida, Naoki and Nakakita, Shogo and Imaizumi, Masaaki},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/yoshida2024icmlw-effect/}
}