MTSAM: Multi-Task Fine-Tuning for Segment Anything Model
Abstract
The Segment Anything Model (SAM), with its remarkable zero-shot capability, has the potential to be a foundation model for multi-task learning. However, adopting SAM to multi-task learning faces two challenges: (a) SAM has difficulty generating task-specific outputs with different channel numbers, and (b) how to fine-tune SAM to adapt multiple downstream tasks simultaneously remains unexplored. To address these two challenges, in this paper, we propose the Multi-Task SAM (MTSAM) framework, which enables SAM to work as a foundation model for multi-task learning. MTSAM modifies SAM's architecture by removing the prompt encoder and implementing task-specific no-mask embeddings and mask decoders, enabling the generation of task-specific outputs. Furthermore, we introduce Tensorized low-Rank Adaptation (ToRA) to perform multi-task fine-tuning on SAM. Specifically, ToRA injects an update parameter tensor into each layer of the encoder in SAM and leverages a low-rank tensor decomposition method to incorporate both task-shared and task-specific information. Extensive experiments conducted on benchmark datasets substantiate the efficacy of MTSAM in enhancing the performance of multi-task learning. Our code is available at https://github.com/XuehaoWangFi/MTSAM.
Cite
Text
Wang et al. "MTSAM: Multi-Task Fine-Tuning for Segment Anything Model." International Conference on Learning Representations, 2025.Markdown
[Wang et al. "MTSAM: Multi-Task Fine-Tuning for Segment Anything Model." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/wang2025iclr-mtsam/)BibTeX
@inproceedings{wang2025iclr-mtsam,
title = {{MTSAM: Multi-Task Fine-Tuning for Segment Anything Model}},
author = {Wang, Xuehao and Zhuang, Zhan and Ye, Feiyang and Zhang, Yu},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/wang2025iclr-mtsam/}
}