Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting

Abstract

CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method.

Cite

Text

Kravets et al. "Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting." International Conference on Computer Vision, 2025.

Markdown

[Kravets et al. "Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/kravets2025iccv-rethinking/)

BibTeX

@inproceedings{kravets2025iccv-rethinking,
  title     = {{Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting}},
  author    = {Kravets, Alexey and Chen, Da and Namboodiri, Vinay P.},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {1902-1911},
  url       = {https://mlanthology.org/iccv/2025/kravets2025iccv-rethinking/}
}