Making SGD Parameter-Free

Abstract

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.

Cite

Text

Carmon and Hinder. "Making SGD Parameter-Free." Conference on Learning Theory, 2022.

Markdown

[Carmon and Hinder. "Making SGD Parameter-Free." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/carmon2022colt-making/)

BibTeX

@inproceedings{carmon2022colt-making,
  title     = {{Making SGD Parameter-Free}},
  author    = {Carmon, Yair and Hinder, Oliver},
  booktitle = {Conference on Learning Theory},
  year      = {2022},
  pages     = {2360-2389},
  volume    = {178},
  url       = {https://mlanthology.org/colt/2022/carmon2022colt-making/}
}