Gradient Diversity: A Key Ingredient for Scalable Distributed Learning

Abstract

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. We introduce the notion of gradient diversity that measures the dissimilarity between concurrent gradient updates, and show its key role in the convergence and generalization performance of mini-batch SGD. We also establish that heuristics similar to DropConnect, Langevin dynamics, and quantization, are provably diversity-inducing mechanisms, and provide experimental evidence indicating that these mechanisms can indeed enable the use of larger batches without sacrificing accuracy and lead to faster training in distributed learning. For example, in one of our experiments, for a convolutional neural network to reach 95% training accuracy on MNIST, using the diversity-inducing mechanism can reduce the training time by 30% in the distributed setting.

Cite

Text

Yin et al. "Gradient Diversity: A Key Ingredient for Scalable Distributed Learning." International Conference on Artificial Intelligence and Statistics, 2018.

Markdown

[Yin et al. "Gradient Diversity: A Key Ingredient for Scalable Distributed Learning." International Conference on Artificial Intelligence and Statistics, 2018.](https://mlanthology.org/aistats/2018/yin2018aistats-gradient/)

BibTeX

@inproceedings{yin2018aistats-gradient,
  title     = {{Gradient Diversity: A Key Ingredient for Scalable Distributed Learning}},
  author    = {Yin, Dong and Pananjady, Ashwin and Lam, Maximilian and Papailiopoulos, Dimitris S. and Ramchandran, Kannan and Bartlett, Peter L.},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2018},
  pages     = {1998-2007},
  url       = {https://mlanthology.org/aistats/2018/yin2018aistats-gradient/}
}