Low-Rank Few-Shot Adaptation of Vision-Language Models

CVPRW 2024 pp. 1593-1603

doi:10.1109/CVPRW63382.2024.00166 /cvprw/2024/zanella2024cvprw-lowrank/

Abstract

Recent progress in the few-shot adaptation of VisionLanguage Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parameter-Efficient Fine-Tuning (PEFT). Furthermore, existing few-shot learning methods for VLMs often rely on heavy training procedures and/or carefully chosen, taskspecific hyper-parameters, which might impede their applicability. In response, we introduce Low-Rank Adaptation (LoRA) in few-shot learning for VLMs, and show its potential on 11 datasets, in comparison to current state-of-the-art prompt- and adapter-based approaches. Surprisingly, our simple CLIP-LoRA method exhibits substantial improvements, while reducing the training times and keeping the same hyper-parameters in all the target tasks, i.e., across all the datasets and numbers of shots. Certainly, our surprising results do not dismiss the potential of promptlearning and adapter-based research. However, we believe that our strong baseline could be used to evaluate progress in these emergent subjects in few-shot VLMs.

PDF CVPRW Semantic Scholar

Cite

Text

Zanella and Ayed. "Low-Rank Few-Shot Adaptation of Vision-Language Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00166

Markdown

[Zanella and Ayed. "Low-Rank Few-Shot Adaptation of Vision-Language Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/zanella2024cvprw-lowrank/) doi:10.1109/CVPRW63382.2024.00166

BibTeX

@inproceedings{zanella2024cvprw-lowrank,
  title     = {{Low-Rank Few-Shot Adaptation of Vision-Language Models}},
  author    = {Zanella, Maxime and Ayed, Ismail Ben},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {1593-1603},
  doi       = {10.1109/CVPRW63382.2024.00166},
  url       = {https://mlanthology.org/cvprw/2024/zanella2024cvprw-lowrank/}
}