CLIP-Driven Coarse-to-Fine Semantic Guidance for Fine-Grained Open-Set Semi-Supervised Learning

Abstract

Fine-grained open-set semi-supervised learning (OSSL) investigates a practical scenario where unlabeled data may contain fine-grained out-of-distribution (OOD) samples. Due to the subtle visual differences among in-distribution (ID) samples, as well as between ID and OOD samples, it is extremely challenging to separate the ID and OOD samples. Due to the subtle visual differences among in-distribution (ID) and OOD samples. Recent Vision-Language Models, such as CLIP, have shown excellent generalization capabilities. However, it tends to focus on general attributes, and thus is insufficient to distinguish the fine-grained details. To tackle the issues, in this paper, we propose a novel CLIP-driven coarse-to-fine semantic-guided framework, named CFSG-CLIP, to progressively focus on the distinctive fine-grained clues. Specifically, CFSG-CLIP comprises a coarse-guidance branch and a fine-guidance branch derived from the pre-trained CLIP model. In the coarse-guidance branch, we design a semantic filtering module to initially filter and highlight local visual features guided by cross-modality features. Then, in the fine-guidance branch, we further design a visual-semantic injection strategy, which embeds category-related visual cues into the visual encoder to further refine the local visual features. By the designed dual-guidance framework, local subtle cues are progressively discovered to distinct the subtle difference between ID and OOD samples. Extensive experiments demonstrate that CFSG-CLIP achieves competitive performance on multiple fine-grained datasets. The source code is available at https://github.com/LxxxxK/CFSG-CLIP.

Cite

Text

Li et al. "CLIP-Driven Coarse-to-Fine Semantic Guidance for Fine-Grained Open-Set Semi-Supervised Learning." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02822

Markdown

[Li et al. "CLIP-Driven Coarse-to-Fine Semantic Guidance for Fine-Grained Open-Set Semi-Supervised Learning." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/li2025cvpr-clipdriven/) doi:10.1109/CVPR52734.2025.02822

BibTeX

@inproceedings{li2025cvpr-clipdriven,
  title     = {{CLIP-Driven Coarse-to-Fine Semantic Guidance for Fine-Grained Open-Set Semi-Supervised Learning}},
  author    = {Li, Xiaokun and Huang, Yaping and Guan, Qingji},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {30312-30321},
  doi       = {10.1109/CVPR52734.2025.02822},
  url       = {https://mlanthology.org/cvpr/2025/li2025cvpr-clipdriven/}
}