UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-Modal Modeling

Abstract

Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes unsustainable due to heavy computational and storage costs. This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on pre-trained vision-language models. Specifically, adapters are distributed to different modalities and their interactions, with the total number of tunable parameters reduced by partial weight sharing. The unified and knowledge-sharing design enables powerful cross-modal representations that can benefit various downstream tasks, requiring only 1.0%-2.0% tunable parameters of the pre-trained model. Extensive experiments on 7 cross-modal downstream benchmarks (including video-text retrieval, image-text retrieval, VideoQA, VQA and Caption) show that in most cases, UniAdapter not only outperforms the state-of-the-arts, but even beats the full fine-tuning strategy. Particularly, on the MSRVTT retrieval task, UniAdapter achieves 49.7% recall@1 with 2.2% model parameters, outperforming the latest competitors by 2.0%. The code and models are available at https://github.com/RERV/UniAdapter.

Cite

Text

Lu et al. "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-Modal Modeling." International Conference on Learning Representations, 2024.

Markdown

[Lu et al. "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-Modal Modeling." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/lu2024iclr-uniadapter/)

BibTeX

@inproceedings{lu2024iclr-uniadapter,
  title     = {{UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-Modal Modeling}},
  author    = {Lu, Haoyu and Huo, Yuqi and Yang, Guoxing and Lu, Zhiwu and Zhan, Wei and Tomizuka, Masayoshi and Ding, Mingyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/lu2024iclr-uniadapter/}
}