Towards Multi-Modal Transformers in Federated Learning

Abstract

Multi-modal transformers mark significant progress in different domains, but privacy concerns on high-quality data hinder their further improvement. Federated learning (FL) has emerged as a promising privacy-preserving paradigm for training models without direct access to the raw data held by different clients. Despite its potential, a considerable research direction regarding the unpaired uni-modal clients and the transformer architecture in FL remains unexplored. To fill this gap, this paper explores a transfer multi-modal federated learning (MFL) scenario within the vision-language domain, where clients possess data of various modalities distributed across different datasets. We systematically evaluate the performance of existing methods when a transformer architecture is utilized and introduce a novel framework called Federated modality complementary and collaboration (FedCola) by addressing the in-modality and cross-modality gaps among clients. Through extensive experiments across various FL settings, FedCola demonstrates superior performance over previous approaches, offering new perspectives on future federated training of multi-modal transformers. Code is available at magentahttps://github.com/imguangyu/FedCola.

Cite

Text

Sun et al. "Towards Multi-Modal Transformers in Federated Learning." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72633-0_13

Markdown

[Sun et al. "Towards Multi-Modal Transformers in Federated Learning." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/sun2024eccv-multimodal/) doi:10.1007/978-3-031-72633-0_13

BibTeX

@inproceedings{sun2024eccv-multimodal,
  title     = {{Towards Multi-Modal Transformers in Federated Learning}},
  author    = {Sun, Guangyu and Mendieta, Matias and Dutta, Aritra and Li, Xin and Chen, Chen},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72633-0_13},
  url       = {https://mlanthology.org/eccv/2024/sun2024eccv-multimodal/}
}