Coordinating Distributed Example Orders for Provably Accelerated Training

Abstract

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

Cite

Text

Cooper et al. "Coordinating Distributed Example Orders for Provably Accelerated Training." Neural Information Processing Systems, 2023.

Markdown

[Cooper et al. "Coordinating Distributed Example Orders for Provably Accelerated Training." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/cooper2023neurips-coordinating/)

BibTeX

@inproceedings{cooper2023neurips-coordinating,
  title     = {{Coordinating Distributed Example Orders for Provably Accelerated Training}},
  author    = {Cooper, A. Feder and Guo, Wentao and Pham, Duc Khiem and Yuan, Tiancheng and Ruan, Charlie and Lu, Yucheng and De Sa, Christopher M},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/cooper2023neurips-coordinating/}
}