CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention

Guo, Ziyu; Zhang, Renrui; Qiu, Longtian; Ma, Xianzheng; Miao, Xupeng; He, Xuming; Cui, Bin

doi:10.1609/AAAI.V37I1.25152

CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention

Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui

AAAI 2023 pp. 746-754

doi:10.1609/AAAI.V37I1.25152 /aaai/2023/guo2023aaai-calip/

Abstract

Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with promising zero-shot performance. To further improve its downstream accuracy, existing works propose additional learnable modules upon CLIP and fine-tune them by few-shot training sets. However, the resulting extra training cost and data requirement severely hinder the efficiency for model deployment and knowledge transfer. In this paper, we introduce a free-lunch enhancement method, CALIP, to boost CLIP's zero-shot performance via a parameter-free attention module. Specifically, we guide visual and textual representations to interact with each other and explore cross-modal informative features via attention. As the pre-training has largely reduced the embedding distances between two modalities, we discard all learnable parameters in the attention and bidirectionally update the multi-modal features, enabling the whole process to be parameter-free and training-free. In this way, the images are blended with textual-aware signals and the text representations become visual-guided for better adaptive zero-shot alignment. We evaluate CALIP on various benchmarks of 14 datasets for both 2D image and 3D point cloud few-shot classification, showing consistent zero-shot performance improvement over CLIP. Based on that, we further insert a small number of linear layers in CALIP's attention module and verify our robustness under the few-shot settings, which also achieves leading performance compared to existing methods. Those extensive experiments demonstrate the superiority of our approach for efficient enhancement of CLIP. Code is available at https://github.com/ZiyuGuo99/CALIP.

PDF AAAI Semantic Scholar

Cite

Text

Guo et al. "CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I1.25152

Markdown

[Guo et al. "CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/guo2023aaai-calip/) doi:10.1609/AAAI.V37I1.25152

BibTeX

@inproceedings{guo2023aaai-calip,
  title     = {{CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention}},
  author    = {Guo, Ziyu and Zhang, Renrui and Qiu, Longtian and Ma, Xianzheng and Miao, Xupeng and He, Xuming and Cui, Bin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {746-754},
  doi       = {10.1609/AAAI.V37I1.25152},
  url       = {https://mlanthology.org/aaai/2023/guo2023aaai-calip/}
}