Learning to Prompt Your Domain for Federated Vision-Language Models

Abstract

The prompt tuning paradigm, with its great advantages of low parameter count and stable training, has recently inspired numerous applications of CLIP-like vision-language models in federated learning. However, in this work, we posit that under significant domain gaps across federated participants, prompt-based CLIP may easily collapse to non-optimal solutions due to the neglect of domain-aware knowledge. We present a novel prompt tuning method, termed ADAPT, to address this issue by learning both intra- and inter-domain prompts. Specifically, we assign each federated participant a domain-specific prompt and use the image's visual features as a condition to guide the generation of language features, with the underlying idea that the prompted CLIP should detect the input image's domain correspondence before making the prediction of its category. Extensive experiments demonstrate ADAPT's significant efficiency and effectiveness in federated learning. For example, by learning and sharing only 2.1M parameters, ADAPT attains a 69.8% average accuracy over the six domains of DomainNet, which improves the original CLIP accuracy by 16.2%.

Cite

Text

Wei et al. "Learning to Prompt Your Domain for Federated Vision-Language Models." Transactions on Machine Learning Research, 2025.

Markdown

[Wei et al. "Learning to Prompt Your Domain for Federated Vision-Language Models." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/wei2025tmlr-learning/)

BibTeX

@article{wei2025tmlr-learning,
  title     = {{Learning to Prompt Your Domain for Federated Vision-Language Models}},
  author    = {Wei, Guoyizhe and Wang, Feng and Shah, Anshul and Chellappa, Rama},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/wei2025tmlr-learning/}
}