A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification
Abstract
Gradient sparsification is a promising technique to significantly reduce the communication overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms. Yet, many existing gradient sparsification schemes (e.g., Top-k sparsification) have a communication complexity of O(kP), where k is the number of selected gradients by each worker and P is the number of workers. Recently, the gTop-k sparsification scheme has been proposed to reduce the communication complexity from O(kP) to O(k logP), which significantly boosts the system scalability. However, it remains unclear whether the gTop-k sparsification scheme can converge in theory. In this paper, we first provide theoretical proofs on the convergence of the gTop-k scheme for non-convex objective functions under certain analytic assumptions. We then derive the convergence rate of gTop-k S-SGD, which is at the same order as the vanilla mini-batch SGD. Finally, we conduct extensive experiments on different machine learning models and data sets to verify the soundness of the assumptions and theoretical results, and discuss the impact of the compression ratio on the convergence performance.
Cite
Text
Shi et al. "A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/473Markdown
[Shi et al. "A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/shi2019ijcai-convergence/) doi:10.24963/IJCAI.2019/473BibTeX
@inproceedings{shi2019ijcai-convergence,
title = {{A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification}},
author = {Shi, Shaohuai and Zhao, Kaiyong and Wang, Qiang and Tang, Zhenheng and Chu, Xiaowen},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2019},
pages = {3411-3417},
doi = {10.24963/IJCAI.2019/473},
url = {https://mlanthology.org/ijcai/2019/shi2019ijcai-convergence/}
}