Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
Abstract
Generalized zero-shot learning aims to recognize both seen and unseen classes with the help of semantic information that is shared among different classes. It inevitably requires consistent visual-semantic alignment. Existing approaches fine-tune the visual backbone by seen-class data to obtain semantic-related visual features, which may cause overfitting on seen classes with a limited number of training images. This paper proposes a novel visual and semantic prompt collaboration framework, which utilizes prompt tuning techniques for efficient feature adaptation. Specifically, we design a visual prompt to integrate the visual information for discriminative feature learning and a semantic prompt to integrate the semantic formation for visual-semantic alignment. To achieve effective prompt information integration, we further design a weak prompt fusion mechanism for the shallow layers and a strong prompt fusion mechanism for the deep layers in the network. Through the collaboration of visual and semantic prompts, we can obtain discriminative semantic-related features for generalized zero-shot image recognition. Extensive experiments demonstrate that our framework consistently achieves favorable performance in both conventional zero-shot learning and generalized zero-shot learning benchmarks compared to other state-of-the-art methods.
Cite
Text
Jiang et al. "Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01888Markdown
[Jiang et al. "Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/jiang2025cvpr-visual/) doi:10.1109/CVPR52734.2025.01888BibTeX
@inproceedings{jiang2025cvpr-visual,
title = {{Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning}},
author = {Jiang, Huajie and Li, Zhengxian and Yu, Xiaohan and Hu, Yongli and Yin, Baocai and Yang, Jian and Qi, Yuankai},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {20275-20285},
doi = {10.1109/CVPR52734.2025.01888},
url = {https://mlanthology.org/cvpr/2025/jiang2025cvpr-visual/}
}