New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity
Abstract
As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD.
Cite
Text
Zhou et al. "New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity." Neural Information Processing Systems, 2018.Markdown
[Zhou et al. "New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/zhou2018neurips-new/)BibTeX
@inproceedings{zhou2018neurips-new,
title = {{New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity}},
author = {Zhou, Pan and Yuan, Xiaotong and Feng, Jiashi},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {1234-1243},
url = {https://mlanthology.org/neurips/2018/zhou2018neurips-new/}
}