MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning

Abstract

Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists of a shared backbone and task-specific decoders. However, the complexity of the decoders increases with the number of tasks. To tackle this challenge, we integrate the decoder-free vision-language model CLIP, which exhibits robust zero-shot generalization capability. Recently, parameter-efficient transfer learning methods have been extensively explored with CLIP for adapting to downstream tasks, where prompt tuning showcases strong potential. Nevertheless, these methods solely fine-tune a single modality (text or visual), disrupting the modality structure of CLIP. In this paper, we first propose Multi-modal Alignment Prompt (MmAP) for CLIP, which aligns text and visual modalities during fine-tuning process. Building upon MmAP, we develop an innovative multi-task prompt learning framework. On the one hand, to maximize the complementarity of tasks with high similarity, we utilize a gradient-driven task grouping method that partitions tasks into several disjoint groups and assign a group-shared MmAP to each group. On the other hand, to preserve the unique characteristics of each task, we assign an task-specific MmAP to each task. Comprehensive experiments on two large multi-task learning datasets demonstrate that our method achieves significant performance improvements compared to full fine-tuning while only utilizing approximately ~ 0.09% of trainable parameters.

Cite

Text

Xin et al. "MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I14.29540

Markdown

[Xin et al. "MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/xin2024aaai-mmap/) doi:10.1609/AAAI.V38I14.29540

BibTeX

@inproceedings{xin2024aaai-mmap,
  title     = {{MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning}},
  author    = {Xin, Yi and Du, Junlong and Wang, Qiang and Yan, Ke and Ding, Shouhong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {16076-16084},
  doi       = {10.1609/AAAI.V38I14.29540},
  url       = {https://mlanthology.org/aaai/2024/xin2024aaai-mmap/}
}