ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models

Abstract

The growing popularity of Contrastive Language-Image Pretraining (CLIP) has led to its widespread application in various visual downstream tasks. To enhance CLIP's effectiveness and versatility, efficient few-shot adaptation techniques have been widely adopted. Among these approaches, training-free methods, particularly caching methods exemplified by Tip-Adapter, have gained attention for their lightweight adaptation without the need for additional fine-tuning. In this paper, we revisit Tip-Adapter from a kernel perspective, showing that caching methods function as local adapters and are connected to a well-established kernel literature. Drawing on this insight, we offer a theoretical understanding of how these methods operate and suggest multiple avenues for enhancing the Tip-Adapter baseline. Notably, our analysis shows the importance of incorporating global information in local adapters. Therefore, we subsequently propose a global method that learns a proximal regularizer in a reproducing kernel Hilbert space (RKHS) using CLIP as a base learner. Our method, which we call ProKeR (Proximal Kernel ridge Regression), has a closed form solution and achieves state-of-the-art performances across 11 datasets in the standard few-shot adaptation benchmark. Code is available at https://ybendou.github.io/ProKeR/.

Cite

Text

Bendou et al. "ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02336

Markdown

[Bendou et al. "ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/bendou2025cvpr-proker/) doi:10.1109/CVPR52734.2025.02336

BibTeX

@inproceedings{bendou2025cvpr-proker,
  title     = {{ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models}},
  author    = {Bendou, Yassir and Ouasfi, Amine and Gripon, Vincent and Boukhayma, Adnane},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {25092-25102},
  doi       = {10.1109/CVPR52734.2025.02336},
  url       = {https://mlanthology.org/cvpr/2025/bendou2025cvpr-proker/}
}