SVRG Meets AdaGrad: Painless Variance Reduction
Abstract
Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more-robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad, a common adaptive gradient method, in the inner loop of SVRG, making it robust to the choice of step-size. When minimizing a sum of n smooth convex functions, we prove that a variant of AdaSVRG requires O~(n+1/ϵ)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\tilde{O}(n + 1/\epsilon )$\end{document} gradient evaluations to achieve an O(ϵ)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$O(\epsilon )$\end{document}-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. Next, we show that the dynamics of AdaGrad exhibit a two-phase behavior – the step-size remains approximately constant in the first phase, and then decreases at a O1/t\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$O\left( {1}/{\sqrt{t}}\right)$\end{document} rate. This result maybe of independent interest, and allows us to propose a heuristic that adaptively determines the length of each inner-loop in AdaSVRG. Via experiments on synthetic and real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over standard and other “tune-free” VR methods.
Cite
Text
Dubois-Taine et al. "SVRG Meets AdaGrad: Painless Variance Reduction." Machine Learning, 2022. doi:10.1007/S10994-022-06265-XMarkdown
[Dubois-Taine et al. "SVRG Meets AdaGrad: Painless Variance Reduction." Machine Learning, 2022.](https://mlanthology.org/mlj/2022/duboistaine2022mlj-svrg/) doi:10.1007/S10994-022-06265-XBibTeX
@article{duboistaine2022mlj-svrg,
title = {{SVRG Meets AdaGrad: Painless Variance Reduction}},
author = {Dubois-Taine, Benjamin and Vaswani, Sharan and Babanezhad, Reza and Schmidt, Mark and Lacoste-Julien, Simon},
journal = {Machine Learning},
year = {2022},
pages = {4359-4409},
doi = {10.1007/S10994-022-06265-X},
volume = {111},
url = {https://mlanthology.org/mlj/2022/duboistaine2022mlj-svrg/}
}