Monte Carlo Gradient Quantization

Abstract

We propose Monte Carlo methods to leverage both sparsity and quantization to compress gradients of neural networks throughout training. On top of reducing the communication exchanged between multiple workers in a distributed setting, we also improve the computational efficiency of each worker. Our method, called Monte Carlo Gradient Quantization (MCGQ), shows faster convergence and higher performance than existing quantization methods on image classification and language modeling. Using both low-bit-width-quantization and high sparsity levels, our method more than doubles the rates of existing compression methods from 200× to 520× and 462× to more than 1200× on different language modeling tasks.

Cite

Text

Mordido et al. "Monte Carlo Gradient Quantization." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00367

Markdown

[Mordido et al. "Monte Carlo Gradient Quantization." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/mordido2020cvprw-monte/) doi:10.1109/CVPRW50498.2020.00367

BibTeX

@inproceedings{mordido2020cvprw-monte,
  title     = {{Monte Carlo Gradient Quantization}},
  author    = {Mordido, Gonçalo and Van Keirsbilck, Matthijs and Keller, Alexander},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {3087-3095},
  doi       = {10.1109/CVPRW50498.2020.00367},
  url       = {https://mlanthology.org/cvprw/2020/mordido2020cvprw-monte/}
}