CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data

Shi, Jiangming; Zheng, Shanshan; Yin, Xiangbo; Lu, Yang; Xie, Yuan; Qu, Yanyun

doi:10.1609/AAAI.V38I13.29416

CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data

Jiangming Shi, Shanshan Zheng, Xiangbo Yin, Yang Lu, Yuan Xie, Yanyun Qu

AAAI 2024 pp. 14955-14963

doi:10.1609/AAAI.V38I13.29416 /aaai/2024/shi2024aaai-clip/

Abstract

Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such as Contrastive Language-Image Pre-training (CLIP), which paves a new way for image classification and object recognition. Inspired by the success of CLIP on few-shot and zero-shot learning, we use CLIP to optimize the federated learning between server and client models under its vision-language supervision. It is promising to mitigate the user heterogeneity and class-distribution balance due to the powerful cross-modality representation and rich open-vocabulary prior knowledge. In this paper, we propose the CLIP-guided FL (CLIP2FL) method on heterogeneous and long-tailed data. In CLIP2FL, the knowledge of the off-the-shelf CLIP model is transferred to the client-server models, and a bridge is built between the client and server. Specifically, for client-side learning, knowledge distillation is conducted between client models and CLIP to improve the ability of client-side feature representation. For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model. A prototype contrastive learning with the supervision of the text encoder of CLIP is introduced to generate federated features depending on the client-side gradients, and they are used to retrain a balanced server classifier. Extensive experimental results on several benchmarks demonstrate that CLIP2FL achieves impressive performance and effectively deals with data heterogeneity and long-tail distribution. The code is available at https://github.com/shijiangming1/CLIP2FL.

PDF AAAI Semantic Scholar

Cite

Text

Shi et al. "CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I13.29416

Markdown

[Shi et al. "CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/shi2024aaai-clip/) doi:10.1609/AAAI.V38I13.29416

BibTeX

@inproceedings{shi2024aaai-clip,
  title     = {{CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data}},
  author    = {Shi, Jiangming and Zheng, Shanshan and Yin, Xiangbo and Lu, Yang and Xie, Yuan and Qu, Yanyun},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {14955-14963},
  doi       = {10.1609/AAAI.V38I13.29416},
  url       = {https://mlanthology.org/aaai/2024/shi2024aaai-clip/}
}