Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation

Abstract

While Federated Learning (FL) is gaining popularity for training machine learning models in a decentralized fashion, numerous challenges persist, such as asynchronization, computational expenses, data heterogeneity, and gradient and membership privacy attacks. Lately, dataset distillation has emerged as a promising solution for addressing the aforementioned challenges by generating a compact synthetic dataset that preserves a model's training efficacy. However, we discover that using distilled local datasets can amplify the heterogeneity issue in FL. To address this, we propose Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation (FedLGD), where we seamlessly integrate dataset distillation algorithms into FL pipeline and train FL using a smaller synthetic dataset (referred as virtual data). Specifically, to harmonize the domain shifts, we propose iterative distribution matching to inpaint global information to *local virtual data* and use federated gradient matching to distill *global virtual data* that serve as anchor points to rectify heterogeneous local training, without compromising data privacy. We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and class-imbalanced data. Our method outperforms *state-of-the-art* heterogeneous FL algorithms under various settings.

Cite

Text

Huang et al. "Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation." Transactions on Machine Learning Research, 2025.

Markdown

[Huang et al. "Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/huang2025tmlr-federated/)

BibTeX

@article{huang2025tmlr-federated,
  title     = {{Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation}},
  author    = {Huang, Chun-Yin and Jin, Ruinan and Zhao, Can and Xu, Daguang and Li, Xiaoxiao},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/huang2025tmlr-federated/}
}