The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

Abstract

Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.

Cite

Text

Patel et al. "The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication." Conference on Learning Theory, 2024.

Markdown

[Patel et al. "The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/patel2024colt-limits/)

BibTeX

@inproceedings{patel2024colt-limits,
  title     = {{The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication}},
  author    = {Patel, Kumar Kshitij and Glasgow, Margalit and Zindari, Ali and Wang, Lingxiao and Stich, Sebastian U and Cheng, Ziheng and Joshi, Nirmit and Srebro, Nathan},
  booktitle = {Conference on Learning Theory},
  year      = {2024},
  pages     = {4115-4157},
  volume    = {247},
  url       = {https://mlanthology.org/colt/2024/patel2024colt-limits/}
}