Fast Convergence of Stochastic Subgradient Method Under Interpolation
Abstract
This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that SSGD converges, respectively, with rates $O(1/\epsilon)$ and $O(\log(1/\epsilon))$ for convex and strongly-convex objectives when interpolation holds. These rates coincide with established rates for the stochastic gradient descent (SGD) method applied to smooth problems that also satisfy an interpolation condition. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rate $O(1/\epsilon)$ is optimal for the subgradient method in the convex and interpolation setting.
Cite
Text
Fang et al. "Fast Convergence of Stochastic Subgradient Method Under Interpolation." International Conference on Learning Representations, 2021.Markdown
[Fang et al. "Fast Convergence of Stochastic Subgradient Method Under Interpolation." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/fang2021iclr-fast/)BibTeX
@inproceedings{fang2021iclr-fast,
title = {{Fast Convergence of Stochastic Subgradient Method Under Interpolation}},
author = {Fang, Huang and Fan, Zhenan and Friedlander, Michael},
booktitle = {International Conference on Learning Representations},
year = {2021},
url = {https://mlanthology.org/iclr/2021/fang2021iclr-fast/}
}