Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass
Abstract
Both gradient descent and dual averaging for convex Lipschitz functions have convergence rates that are highly dependent on the choice of learning rate. Even when the Lipschitz constant is known, setting the learning rate to achieve the optimal convergence rate requires knowing a bound on the distance from the initial point to the solution set $D$. A number of approaches are known that relax this requirement, but they either require line searches, restarting (hyper-parameter grid search), or do not derive from the gradient descent or dual averaging frameworks (coin-betting). In this work we describe a single pass method, with no back-tracking or line searches, derived from dual averaging, which does not require knowledge of $D$ yet asymptotically achieves the optimal rate of convergence for the complexity class of Convex Lipschitz functions.
Cite
Text
Defazio and Mishchenko. "Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass." NeurIPS 2022 Workshops: OPT, 2022.Markdown
[Defazio and Mishchenko. "Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass." NeurIPS 2022 Workshops: OPT, 2022.](https://mlanthology.org/neuripsw/2022/defazio2022neuripsw-parameter/)BibTeX
@inproceedings{defazio2022neuripsw-parameter,
title = {{Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass}},
author = {Defazio, Aaron and Mishchenko, Konstantin},
booktitle = {NeurIPS 2022 Workshops: OPT},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/defazio2022neuripsw-parameter/}
}