Parallelized Stochastic Gradient Descent
Abstract
With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms our variant comes with parallel acceleration guarantees and it poses no overly tight latency constraints, which might only be available in the multicore setting. Our analysis introduces a novel proof technique --- contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits. As a side effect this answers the question of how quickly stochastic gradient descent algorithms reach the asymptotically normal regime.
Cite
Text
Zinkevich et al. "Parallelized Stochastic Gradient Descent." Neural Information Processing Systems, 2010.Markdown
[Zinkevich et al. "Parallelized Stochastic Gradient Descent." Neural Information Processing Systems, 2010.](https://mlanthology.org/neurips/2010/zinkevich2010neurips-parallelized/)BibTeX
@inproceedings{zinkevich2010neurips-parallelized,
title = {{Parallelized Stochastic Gradient Descent}},
author = {Zinkevich, Martin and Weimer, Markus and Li, Lihong and Smola, Alex J.},
booktitle = {Neural Information Processing Systems},
year = {2010},
pages = {2595-2603},
url = {https://mlanthology.org/neurips/2010/zinkevich2010neurips-parallelized/}
}