SVRG Meets AdaGrad: Painless Variance Reduction

Abstract

Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more-robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad, a common adaptive gradient method, in the inner loop of SVRG, making it robust to the choice of step-size. When minimizing a sum of n smooth convex functions, we prove that a variant of AdaSVRG requires O~(n+1/ϵ)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\tilde{O}(n + 1/\epsilon )$\end{document} gradient evaluations to achieve an O(ϵ)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$O(\epsilon )$\end{document}-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. Next, we show that the dynamics of AdaGrad exhibit a two-phase behavior – the step-size remains approximately constant in the first phase, and then decreases at a O1/t\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$O\left( {1}/{\sqrt{t}}\right)$\end{document} rate. This result maybe of independent interest, and allows us to propose a heuristic that adaptively determines the length of each inner-loop in AdaSVRG. Via experiments on synthetic and real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over standard and other “tune-free” VR methods.

Cite

Text

Dubois-Taine et al. "SVRG Meets AdaGrad: Painless Variance Reduction." Machine Learning, 2022. doi:10.1007/S10994-022-06265-X

Markdown

[Dubois-Taine et al. "SVRG Meets AdaGrad: Painless Variance Reduction." Machine Learning, 2022.](https://mlanthology.org/mlj/2022/duboistaine2022mlj-svrg/) doi:10.1007/S10994-022-06265-X

BibTeX

@article{duboistaine2022mlj-svrg,
  title     = {{SVRG Meets AdaGrad: Painless Variance Reduction}},
  author    = {Dubois-Taine, Benjamin and Vaswani, Sharan and Babanezhad, Reza and Schmidt, Mark and Lacoste-Julien, Simon},
  journal   = {Machine Learning},
  year      = {2022},
  pages     = {4359-4409},
  doi       = {10.1007/S10994-022-06265-X},
  volume    = {111},
  url       = {https://mlanthology.org/mlj/2022/duboistaine2022mlj-svrg/}
}