Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
Abstract
This paper introduces a generalized federated prompt-tuning framework for practical scenarios where local datasets are multi-modal and exhibit different distributional patterns of missing features at the input level. The proposed framework bridges the gap between federated learning and multi-modal prompt-tuning which have traditionally focused on either uni-modal or centralized data. A key challenge in this setting arises from the lack of semantic alignment between prompt instructions that encode similar distributional patterns of missing data across different clients. To address this, our framework introduces specialized client-tuning and server-aggregation designs that simultaneously optimize, align, and aggregate prompt-tuning instructions across clients and data modalities. This allows prompt instructions to complement one another and be combined effectively. Extensive evaluations on diverse multimodal benchmark datasets demonstrate that our work consistently outperforms state-of-the-art (SOTA) baselines.
Cite
Text
Phung et al. "Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data." International Conference on Computer Vision, 2025.Markdown
[Phung et al. "Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/phung2025iccv-federated/)BibTeX
@inproceedings{phung2025iccv-federated,
title = {{Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data}},
author = {Phung, Thu Hang and Nguyen, Duong M. and Huynh, Thanh Trung and Nguyen, Quoc Viet Hung and Hoang, Trong Nghia and Le Nguyen, Phi},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {3936-3946},
url = {https://mlanthology.org/iccv/2025/phung2025iccv-federated/}
}