MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning
Abstract
In this paper, we explore the advantages of utilizing transformer structures for addressing multi-task learning (MTL). Specifically, we demonstrate that models with transformer structures are more appropriate for MTL than convolutional neural networks (CNNs), and we propose a novel transformer-based architecture named MTFormer for MTL. In the framework, multiple tasks share the same transformer encoder and transformer decoder, and lightweight branches are introduced to harvest task-specific outputs, which increases the MTL performance and reduces the time-space complexity. Furthermore, information from different task domains can benefit each other, and we conduct cross-task reasoning. We propose a cross-task attention mechanism for further boosting the MTL results. The cross-task attention mechanism brings little parameters and computations while introducing extra performance improvements. Besides, we design a self-supervised cross-task contrastive learning algorithm for further boosting the MTL performance. Extensive experiments are conducted on two multi-task learning datasets, on which MTFormer achieves state-of-the-art results with limited network parameters and computations. It also demonstrates significant superiorities for few-shot learning and zero-shot learning.
Cite
Text
Xu et al. "MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19812-0_18Markdown
[Xu et al. "MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/xu2022eccv-mtformer/) doi:10.1007/978-3-031-19812-0_18BibTeX
@inproceedings{xu2022eccv-mtformer,
title = {{MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning}},
author = {Xu, Xiaogang and Zhao, Hengshuang and Vineet, Vibhav and Lim, Ser-Nam and Torralba, Antonio},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19812-0_18},
url = {https://mlanthology.org/eccv/2022/xu2022eccv-mtformer/}
}