Mini but Mighty: Finetuning ViTs with Mini Adapters
Abstract
Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of fine-tuning. In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce the size of every adapter. We introduce a scoring function that compares neuron importance across layers and consequently allows automatic estimation of the hidden dimension of every adapter. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets. We will release our code publicly upon acceptance.
Cite
Text
Marouf et al. "Mini but Mighty: Finetuning ViTs with Mini Adapters." Winter Conference on Applications of Computer Vision, 2024.Markdown
[Marouf et al. "Mini but Mighty: Finetuning ViTs with Mini Adapters." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/marouf2024wacv-mini/)BibTeX
@inproceedings{marouf2024wacv-mini,
title = {{Mini but Mighty: Finetuning ViTs with Mini Adapters}},
author = {Marouf, Imad Eddine and Tartaglione, Enzo and Lathuilière, Stéphane},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2024},
pages = {1732-1741},
url = {https://mlanthology.org/wacv/2024/marouf2024wacv-mini/}
}