$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-Task Interpolation

Abstract

Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. $\pi$-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that $\pi$-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities. The code will be available at https://github.com/TencentARC/pi-Tuning.

Cite

Text

Wu et al. "$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-Task Interpolation." International Conference on Machine Learning, 2023.

Markdown

[Wu et al. "$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-Task Interpolation." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/wu2023icml-tuning/)

BibTeX

@inproceedings{wu2023icml-tuning,
  title     = {{$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-Task Interpolation}},
  author    = {Wu, Chengyue and Wang, Teng and Ge, Yixiao and Lu, Zeyu and Zhou, Ruisong and Shan, Ying and Luo, Ping},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {37713-37727},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/wu2023icml-tuning/}
}