Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes
Abstract
Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way – and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central “aggregator” which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show O(K^0.5) communication rounds for heterogeneous data for strongly convex problems, where K is the total number of gradient computations across all local compute nodes. For our scheme, we prove a tight and novel non-trivial convergence analysis for strongly convex problems for heterogeneous data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.
Cite
Text
Nguyen et al. " Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes ." Artificial Intelligence and Statistics, 2021.Markdown
[Nguyen et al. " Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes ." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/nguyen2021aistats-hogwild/)BibTeX
@inproceedings{nguyen2021aistats-hogwild,
title = {{ Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes }},
author = {Nguyen, Nhuong and Nguyen, Toan and Ha Nguyen, Phuong and Tran-Dinh, Quoc and Nguyen, Lam and Dijk, Marten},
booktitle = {Artificial Intelligence and Statistics},
year = {2021},
pages = {1207-1215},
volume = {130},
url = {https://mlanthology.org/aistats/2021/nguyen2021aistats-hogwild/}
}