VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Abstract

Large-scale pre-trained models have achieved remarkable success in various computer vision tasks. A standard approach to leverage these models is to fine-tune all model parameters for downstream tasks, which poses challenges in terms of computational and storage costs. Recently, inspired by Natural Language Processing (NLP), parameter-efficient transfer learning has been successfully applied to vision tasks. However, most existing techniques primarily focus on single-task adaptation, and despite limited research on multi-task adaptation, these methods often exhibit suboptimal training/inference efficiency. In this paper, we first propose an once-for-all Vision Multi-Task Adapter (VMT-Adapter), which strikes approximately O(1) training and inference efficiency w.r.t task number. Concretely, VMT-Adapter shares the knowledge from multiple tasks to enhance cross-task interaction while preserves task-specific knowledge via independent knowledge extraction modules. Notably, since task-specific modules require few parameters, VMT-Adapter can handle an arbitrary number of tasks with a negligible increase of trainable parameters. We also propose VMT-Adapter-Lite, which further reduces the trainable parameters by learning shared parameters between down- and up-projections. Extensive experiments on four dense scene understanding tasks demonstrate the superiority of VMT-Adapter(-Lite), achieving a 3.96% (1.34%) relative improvement compared to single-task full fine-tuning, while utilizing merely ~1% (0.36%) trainable parameters of the pre-trained model.

Cite

Text

Xin et al. "VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I14.29541

Markdown

[Xin et al. "VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/xin2024aaai-vmt/) doi:10.1609/AAAI.V38I14.29541

BibTeX

@inproceedings{xin2024aaai-vmt,
  title     = {{VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding}},
  author    = {Xin, Yi and Du, Junlong and Wang, Qiang and Lin, Zhiwen and Yan, Ke},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {16085-16093},
  doi       = {10.1609/AAAI.V38I14.29541},
  url       = {https://mlanthology.org/aaai/2024/xin2024aaai-vmt/}
}