A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification

Shi, Shaohuai; Zhao, Kaiyong; Wang, Qiang; Tang, Zhenheng; Chu, Xiaowen

doi:10.24963/IJCAI.2019/473

A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification

Shaohuai Shi, Kaiyong Zhao, Qiang Wang, Zhenheng Tang, Xiaowen Chu

IJCAI 2019 pp. 3411-3417

doi:10.24963/IJCAI.2019/473 /ijcai/2019/shi2019ijcai-convergence/

Abstract

Gradient sparsification is a promising technique to significantly reduce the communication overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms. Yet, many existing gradient sparsification schemes (e.g., Top-k sparsification) have a communication complexity of O(kP), where k is the number of selected gradients by each worker and P is the number of workers. Recently, the gTop-k sparsification scheme has been proposed to reduce the communication complexity from O(kP) to O(k logP), which significantly boosts the system scalability. However, it remains unclear whether the gTop-k sparsification scheme can converge in theory. In this paper, we first provide theoretical proofs on the convergence of the gTop-k scheme for non-convex objective functions under certain analytic assumptions. We then derive the convergence rate of gTop-k S-SGD, which is at the same order as the vanilla mini-batch SGD. Finally, we conduct extensive experiments on different machine learning models and data sets to verify the soundness of the assumptions and theoretical results, and discuss the impact of the compression ratio on the convergence performance.

PDF IJCAI Semantic Scholar

Cite

Text

Shi et al. "A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/473

Markdown

[Shi et al. "A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/shi2019ijcai-convergence/) doi:10.24963/IJCAI.2019/473

BibTeX

@inproceedings{shi2019ijcai-convergence,
  title     = {{A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification}},
  author    = {Shi, Shaohuai and Zhao, Kaiyong and Wang, Qiang and Tang, Zhenheng and Chu, Xiaowen},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {3411-3417},
  doi       = {10.24963/IJCAI.2019/473},
  url       = {https://mlanthology.org/ijcai/2019/shi2019ijcai-convergence/}
}