Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner

Abstract

Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously. Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and integrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning. However, their rigid combination hampers both the optimization of MoE and the effectiveness of reparameterization of LoRA, leading to sub-optimal performance and low inference speed. In this work, we propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner during training, and reparameterizing the learned structure for efficient inference. Specifically, we firstly develop the MoEfied LoRA structure, which decomposes the pre-trained Transformer into a low-rank MoE structure and employ LoRA to fine-tune the parameters. Subsequently, we take into account the intrinsic asynchronous nature of multi-task learning and devise a learning Quality Retaining (QR) optimization mechanism, by leveraging the historical high-quality class logits to prevent a well-trained task from performance degradation. Finally, we design a router fading strategy to integrate the learned parameters into the original Transformer, archiving efficient inference. Extensive experiments on public benchmarks demonstrate the superiority of our method, compared to the state-of-the-art multi-task learning approaches.

Cite

Text

Zhong et al. "Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner." Neural Information Processing Systems, 2024. doi:10.52202/079017-2579

Markdown

[Zhong et al. "Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhong2024neurips-transforming/) doi:10.52202/079017-2579

BibTeX

@inproceedings{zhong2024neurips-transforming,
  title     = {{Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner}},
  author    = {Zhong, Hanwen and Chen, Jiaxin and Zhang, Yutong and Huang, Di and Wang, Yunhong},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2579},
  url       = {https://mlanthology.org/neurips/2024/zhong2024neurips-transforming/}
}