Data-Dependent Stability of Stochastic Gradient Descent

Abstract

We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

Cite

Text

Kuzborskij and Lampert. "Data-Dependent Stability of Stochastic Gradient Descent." International Conference on Machine Learning, 2018.

Markdown

[Kuzborskij and Lampert. "Data-Dependent Stability of Stochastic Gradient Descent." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/kuzborskij2018icml-datadependent/)

BibTeX

@inproceedings{kuzborskij2018icml-datadependent,
  title     = {{Data-Dependent Stability of Stochastic Gradient Descent}},
  author    = {Kuzborskij, Ilja and Lampert, Christoph},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {2815-2824},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/kuzborskij2018icml-datadependent/}
}