Variance Reduced Model Based Methods: New Rates and Adaptive Step Sizes
Abstract
Variance reduced gradients methods were introduced to control the variance of SGD (Stochastic Gradient Descent). Model-based methods are able to make use of a known lower bound on the loss, for instance, most loss functions are positive. We show how these two classes of methods can be seamlessly combined. As an example we present a Model-based Stochastic Average Gradient method MSAG, which results from using a truncated model together with the SAG method. At each iteration MSAG computes an adaptive learning rate based on a given known lower bound. When given access to the optimal objective as the lower bound, MSAG has several favorable convergence properties, including monotonic iterates, and convergence in the non-smooth, smooth and strongly convex setting. Our convergence theorems show that we can essentially trade-off knowing the smoothness constant $L_{\max}$ for knowing the optimal objective to achieve the favourable convergence of variance reduced gradient methods. Moreover our convergence proofs for MSAG are very simple, which is in contrast to complexity of the original convergence proofs of SAG.
Cite
Text
Gower et al. "Variance Reduced Model Based Methods: New Rates and Adaptive Step Sizes." NeurIPS 2023 Workshops: OPT, 2023.Markdown
[Gower et al. "Variance Reduced Model Based Methods: New Rates and Adaptive Step Sizes." NeurIPS 2023 Workshops: OPT, 2023.](https://mlanthology.org/neuripsw/2023/gower2023neuripsw-variance/)BibTeX
@inproceedings{gower2023neuripsw-variance,
title = {{Variance Reduced Model Based Methods: New Rates and Adaptive Step Sizes}},
author = {Gower, Robert M. and Kunstner, Frederik and Schmidt, Mark},
booktitle = {NeurIPS 2023 Workshops: OPT},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/gower2023neuripsw-variance/}
}