GPU Asynchronous Stochastic Gradient Descent to Speed up Neural Network Training

Abstract

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU accelerated training, which has seen quick adoption in computer vision circles, and data parallelism, e.g. A-SGD, whose large scale has been used mostly in industry. We report early experiments with a system that makes use of both model parallelism and data parallelism, we call GPU A-SGD. We show using GPU A-SGD it is possible to speed up training of large convolutional neural networks useful for computer vision. We believe GPU A-SGD will make it possible to train larger networks on larger training sets in a reasonable amount of time.

Cite

Text

Paine et al. "GPU Asynchronous Stochastic Gradient Descent to Speed up Neural Network Training." International Conference on Learning Representations, 2014.

Markdown

[Paine et al. "GPU Asynchronous Stochastic Gradient Descent to Speed up Neural Network Training." International Conference on Learning Representations, 2014.](https://mlanthology.org/iclr/2014/paine2014iclr-gpu/)

BibTeX

@inproceedings{paine2014iclr-gpu,
  title     = {{GPU Asynchronous Stochastic Gradient Descent to Speed up Neural Network Training}},
  author    = {Paine, Thomas and Jin, Hailin and Yang, Jianchao and Lin, Zhe and Huang, Thomas S.},
  booktitle = {International Conference on Learning Representations},
  year      = {2014},
  url       = {https://mlanthology.org/iclr/2014/paine2014iclr-gpu/}
}