Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix

Abstract

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix, in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients will provide further information on the loss surface.

Cite

Text

Arnold and Wang. "Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix." International Conference on Learning Representations, 2017.

Markdown

[Arnold and Wang. "Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/arnold2017iclr-accelerating/)

BibTeX

@inproceedings{arnold2017iclr-accelerating,
  title     = {{Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix}},
  author    = {Arnold, Sébastien M. R. and Wang, Chunming},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/arnold2017iclr-accelerating/}
}