DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Chen, Zhuo; Huang, Yufeng; Chen, Jiaoyan; Geng, Yuxia; Zhang, Wen; Fang, Yin; Pan, Jeff Z.; Chen, Huajun

doi:10.1609/AAAI.V37I1.25114

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Huajun Chen

AAAI 2023 pp. 405-413

doi:10.1609/AAAI.V37I1.25114 /aaai/2023/chen2023aaai-duet/

Abstract

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

PDF AAAI Semantic Scholar

Cite

Text

Chen et al. "DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I1.25114

Markdown

[Chen et al. "DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/chen2023aaai-duet/) doi:10.1609/AAAI.V37I1.25114

BibTeX

@inproceedings{chen2023aaai-duet,
  title     = {{DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning}},
  author    = {Chen, Zhuo and Huang, Yufeng and Chen, Jiaoyan and Geng, Yuxia and Zhang, Wen and Fang, Yin and Pan, Jeff Z. and Chen, Huajun},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {405-413},
  doi       = {10.1609/AAAI.V37I1.25114},
  url       = {https://mlanthology.org/aaai/2023/chen2023aaai-duet/}
}