Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

Abstract

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction in communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with $46\%$ less wall clock time.

Cite

Text

Zheng et al. "Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback." Neural Information Processing Systems, 2019.

Markdown

[Zheng et al. "Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/zheng2019neurips-communicationefficient/)

BibTeX

@inproceedings{zheng2019neurips-communicationefficient,
  title     = {{Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback}},
  author    = {Zheng, Shuai and Huang, Ziyue and Kwok, James},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {11450-11460},
  url       = {https://mlanthology.org/neurips/2019/zheng2019neurips-communicationefficient/}
}