Is Local SGD Better than Minibatch SGD?
Abstract
We study local SGD (also known as parallel SGD and federated SGD), a natural and frequently used distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minmax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least \emph{sometimes} improves over minibatch SGD, but our guarantee does not always improve over, nor even match, minibatch SGD; (3) We show that indeed local SGD does \emph{not} dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.
Cite
Text
Woodworth et al. "Is Local SGD Better than Minibatch SGD?." International Conference on Machine Learning, 2020.Markdown
[Woodworth et al. "Is Local SGD Better than Minibatch SGD?." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/woodworth2020icml-local/)BibTeX
@inproceedings{woodworth2020icml-local,
title = {{Is Local SGD Better than Minibatch SGD?}},
author = {Woodworth, Blake and Patel, Kumar Kshitij and Stich, Sebastian and Dai, Zhen and Bullins, Brian and Mcmahan, Brendan and Shamir, Ohad and Srebro, Nathan},
booktitle = {International Conference on Machine Learning},
year = {2020},
pages = {10334-10343},
volume = {119},
url = {https://mlanthology.org/icml/2020/woodworth2020icml-local/}
}