CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

Abstract

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. Recent researchers focus on applying large-scale Vision-Language Pre-trained (VLP) models like CLIP with strong generalization ability. However, these methods treat the pre-trained model as a black box and focus on pre- and post-CLIP operations, which do not inherently mine the semantic concept between the layers inside CLIP. We propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted. We assess our method on four popular CZSL datasets, MIT-States, C-GQA, UT-Zappos, and VAW-CZSL, which shows state-of-the-art performance compared to existing methods on all of them.

Cite

Text

Zheng et al. "CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Zheng et al. "CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/zheng2024wacv-caila/)

BibTeX

@inproceedings{zheng2024wacv-caila,
  title     = {{CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning}},
  author    = {Zheng, Zhaoheng and Zhu, Haidong and Nevatia, Ram},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {1721-1731},
  url       = {https://mlanthology.org/wacv/2024/zheng2024wacv-caila/}
}