Logits DeConfusion with CLIP for Few-Shot Learning

Abstract

With its powerful visual-language alignment capability, CLIP performs well in zero-shot and few-shot learning tasks. However, we found in experiments that CLIP's logits suffer from serious inter-class confusion problems in downstream tasks, and the ambiguity between categories seriously affects the accuracy. To address this challenge, we propose a novel method called Logits DeConfusion, which effectively learns and eliminates inter-class confusion in logits by combining our Multi-level Adapter Fusion (MAF) module with our Inter-Class Deconfusion (ICD) module. Our MAF extracts features from different levels and fuses them uniformly to enhance feature representation. Our ICD learnably eliminates inter-class confusion in logits with a residual structure. Experimental results show that our method can significantly improve the classification performance and alleviate the inter-class confusion problem. The code is available at https://github.com/LiShuo1001/LDC.

Cite

Text

Li et al. "Logits DeConfusion with CLIP for Few-Shot Learning." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02366

Markdown

[Li et al. "Logits DeConfusion with CLIP for Few-Shot Learning." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/li2025cvpr-logits/) doi:10.1109/CVPR52734.2025.02366

BibTeX

@inproceedings{li2025cvpr-logits,
  title     = {{Logits DeConfusion with CLIP for Few-Shot Learning}},
  author    = {Li, Shuo and Liu, Fang and Hao, Zehua and Wang, Xinyi and Li, Lingling and Liu, Xu and Chen, Puhua and Ma, Wenping},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {25411-25421},
  doi       = {10.1109/CVPR52734.2025.02366},
  url       = {https://mlanthology.org/cvpr/2025/li2025cvpr-logits/}
}