On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning

Abstract

Federated Averaging/local SGD is the most common optimization method for federated learning that has proven effective in many real-world applications, dominating simple baselines like mini-batch SGD for convex and non-convex objectives. However, theoretically showing the effectiveness of local SGD remains challenging, posing a huge gap between theory and practice. In this paper, we provide new lower bounds for local SGD for convex objectives, ruling out proposed heterogeneity assumptions that try to capture this \textit{``unreasonable"} effectiveness of local SGD. We further show that accelerated mini-batch SGD is, in fact, min-max optimal under some of these heterogeneity notions. Our results indicate that strong convexity of a client's objective might be necessary to utilize several heterogeneity assumptions. This also highlights the need for new heterogeneity assumptions for federated optimization for the general convex setting, and we discuss some alternative assumptions.

Cite

Text

Patel et al. "On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning." ICML 2023 Workshops: FL, 2023.

Markdown

[Patel et al. "On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning." ICML 2023 Workshops: FL, 2023.](https://mlanthology.org/icmlw/2023/patel2023icmlw-still/)

BibTeX

@inproceedings{patel2023icmlw-still,
  title     = {{On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning}},
  author    = {Patel, Kumar Kshitij and Glasgow, Margalit and Wang, Lingxiao and Joshi, Nirmit and Srebro, Nathan},
  booktitle = {ICML 2023 Workshops: FL},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/patel2023icmlw-still/}
}