Synthetic Data Shuffling Accelerates the Convergence of Federated Learning Under Data Heterogeneity
Abstract
In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We discuss that shuffling can, in some cases, quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.
Cite
Text
Li et al. "Synthetic Data Shuffling Accelerates the Convergence of Federated Learning Under Data Heterogeneity." Transactions on Machine Learning Research, 2024.Markdown
[Li et al. "Synthetic Data Shuffling Accelerates the Convergence of Federated Learning Under Data Heterogeneity." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/li2024tmlr-synthetic/)BibTeX
@article{li2024tmlr-synthetic,
title = {{Synthetic Data Shuffling Accelerates the Convergence of Federated Learning Under Data Heterogeneity}},
author = {Li, Bo and Esfandiari, Yasin and Schmidt, Mikkel N. and Alstrøm, Tommy Sonne and Stich, Sebastian U},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/li2024tmlr-synthetic/}
}