A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning
Abstract
In this paper, we propose a new algorithm, termed Predicting Clipping Asynchronous Stochastic Gradient Descent (aka, PC-ASGD) to address the issue of staleness and time delay in asynchronous distributed learning settings. Specifically, PC-ASGD has two steps - the predicting step leverages the gradient prediction using Taylor expansion to reduce the staleness of the outdated weights while the clipping step selectively drops the outdated weights to alleviate their negative effects. A tradeoff parameter is introduced to balance the effects between these two steps. We theoretically present the convergence rate considering the effects of delay of the proposed algorithm with constant step size when the smooth objective functions are nonconvex. For empirical validation, we demonstrate the performance of the algorithm with two deep neural network architectures on two benchmark datasets.
Cite
Text
Wang et al. "A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning." NeurIPS 2023 Workshops: OPT, 2023.Markdown
[Wang et al. "A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning." NeurIPS 2023 Workshops: OPT, 2023.](https://mlanthology.org/neuripsw/2023/wang2023neuripsw-predicting/)BibTeX
@inproceedings{wang2023neuripsw-predicting,
title = {{A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning}},
author = {Wang, Haoxiang and Jiang, Zhanhong and Liu, Chao and Sarkar, Soumik and Jiang, Dongxiang and Lee, Young M},
booktitle = {NeurIPS 2023 Workshops: OPT},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/wang2023neuripsw-predicting/}
}