Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Abstract

Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. In realistic Internet networks with 10~Gbps access links for silos, our algorithms speed up training by a factor 9 and 1.5 in comparison to the master-slave architecture and to state-of-the-art MATCHA, respectively. Speedups are even larger with slower access links.

Cite

Text

Marfoq et al. "Throughput-Optimal Topology Design for Cross-Silo Federated Learning." Neural Information Processing Systems, 2020.

Markdown

[Marfoq et al. "Throughput-Optimal Topology Design for Cross-Silo Federated Learning." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/marfoq2020neurips-throughputoptimal/)

BibTeX

@inproceedings{marfoq2020neurips-throughputoptimal,
  title     = {{Throughput-Optimal Topology Design for Cross-Silo Federated Learning}},
  author    = {Marfoq, Othmane and Xu, Chuan and Neglia, Giovanni and Vidal, Richard},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/marfoq2020neurips-throughputoptimal/}
}