Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Abstract

Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently there is a lack of focus on guiding this delicate trade-off. In this study we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices providing insights into the tuning dynamics of existing methods. Building upon this understanding we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at https://github.com/zstarN70/RLRR.git.

Cite

Text

Dong et al. "Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01524

Markdown

[Dong et al. "Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/dong2024cvpr-lowrank/) doi:10.1109/CVPR52733.2024.01524

BibTeX

@inproceedings{dong2024cvpr-lowrank,
  title     = {{Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach}},
  author    = {Dong, Wei and Zhang, Xing and Chen, Bihui and Yan, Dawei and Lin, Zhijun and Yan, Qingsen and Wang, Peng and Yang, Yang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {16101-16110},
  doi       = {10.1109/CVPR52733.2024.01524},
  url       = {https://mlanthology.org/cvpr/2024/dong2024cvpr-lowrank/}
}