Towards Understanding Acceleration Tradeoff Between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao

NeurIPS 2018 pp. 3682-3692

/neurips/2018/liu2018neurips-understanding/

Abstract

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

PDF NeurIPS Semantic Scholar

Cite

Text

Liu et al. "Towards Understanding Acceleration Tradeoff Between Momentum and Asynchrony in Nonconvex Stochastic Optimization." Neural Information Processing Systems, 2018.

Markdown

[Liu et al. "Towards Understanding Acceleration Tradeoff Between Momentum and Asynchrony in Nonconvex Stochastic Optimization." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/liu2018neurips-understanding/)

BibTeX

@inproceedings{liu2018neurips-understanding,
  title     = {{Towards Understanding Acceleration Tradeoff Between Momentum and Asynchrony in Nonconvex Stochastic Optimization}},
  author    = {Liu, Tianyi and Li, Shiyang and Shi, Jianping and Zhou, Enlu and Zhao, Tuo},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {3682-3692},
  url       = {https://mlanthology.org/neurips/2018/liu2018neurips-understanding/}
}