SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

Abstract

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general “affine variance” noise model and provides sharp rates of convergence in both the low-noise and high-noise regimes.

Cite

Text

Attia and Koren. "SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance." International Conference on Machine Learning, 2023.

Markdown

[Attia and Koren. "SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/attia2023icml-sgd/)

BibTeX

@inproceedings{attia2023icml-sgd,
  title     = {{SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance}},
  author    = {Attia, Amit and Koren, Tomer},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {1147-1171},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/attia2023icml-sgd/}
}