Never Go Full Batch (in Stochastic Convex Optimization)

Abstract

We study the generalization performance of $\text{\emph{full-batch}}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within $\epsilon$ after $O(1/\epsilon^2)$ iterations, full-batch methods either need at least $\Omega(1/\epsilon^4)$ iterations or exhibit a dimension-dependent sample complexity.

Cite

Text

Amir et al. "Never Go Full Batch (in Stochastic Convex Optimization)." Neural Information Processing Systems, 2021.

Markdown

[Amir et al. "Never Go Full Batch (in Stochastic Convex Optimization)." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/amir2021neurips-never/)

BibTeX

@inproceedings{amir2021neurips-never,
  title     = {{Never Go Full Batch (in Stochastic Convex Optimization)}},
  author    = {Amir, Idan and Carmon, Yair and Koren, Tomer and Livni, Roi},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/amir2021neurips-never/}
}