Zeno++: Robust Fully Asynchronous SGD
Abstract
We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent(SGD) procedure, intended to tolerate Byzantine failures of workers. In contrast to previous work, Zeno++ removes several unrealistic restrictions on worker-server communication, now allowing for fully asynchronous updates from anonymous workers, for arbitrarily stale worker updates, and for the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing Byzantine-tolerant asynchronous SGD algorithms.
Cite
Text
Xie et al. "Zeno++: Robust Fully Asynchronous SGD." International Conference on Machine Learning, 2020.Markdown
[Xie et al. "Zeno++: Robust Fully Asynchronous SGD." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/xie2020icml-zeno/)BibTeX
@inproceedings{xie2020icml-zeno,
title = {{Zeno++: Robust Fully Asynchronous SGD}},
author = {Xie, Cong and Koyejo, Sanmi and Gupta, Indranil},
booktitle = {International Conference on Machine Learning},
year = {2020},
pages = {10495-10503},
volume = {119},
url = {https://mlanthology.org/icml/2020/xie2020icml-zeno/}
}