Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement
Abstract
We study the round complexity of minimizing the average of convex functions under a new setting of distributed optimization where each machine can receive two subsets of functions. The first subset is from a random partition and the second subset is randomly sampled with replacement. Under this setting, we define a broad class of distributed algorithms whose local computation can utilize both subsets and design a distributed stochastic variance reduced gradient method belonging to in this class. When the condition number of the problem is small, our method achieves the optimal parallel runtime, amount of communication and rounds of communication among all distributed first-order methods up to constant factors. When the condition number is relatively large, a lower bound is provided for the number of rounds of communication needed by any algorithm in this class. Then, we present an accelerated version of our method whose the rounds of communication matches the lower bound up to logarithmic terms, which establishes that this accelerated algorithm has the lowest round complexity among all algorithms in our class under this new setting.
Cite
Text
Lee et al. "Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement." Journal of Machine Learning Research, 2017.Markdown
[Lee et al. "Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement." Journal of Machine Learning Research, 2017.](https://mlanthology.org/jmlr/2017/lee2017jmlr-distributed/)BibTeX
@article{lee2017jmlr-distributed,
title = {{Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement}},
author = {Lee, Jason D. and Lin, Qihang and Ma, Tengyu and Yang, Tianbao},
journal = {Journal of Machine Learning Research},
year = {2017},
pages = {1-43},
volume = {18},
url = {https://mlanthology.org/jmlr/2017/lee2017jmlr-distributed/}
}