Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

Abstract

Fine-tuning vision-language models (VLMs) with large amounts of unlabeled data has recently garnered significant interest. However, a key challenge remains the lack of high-quality pseudo-labeled data. Current pseudo-labeling strategies often struggle with mismatches between semantic and visual information, leading to sub-optimal performance of unsupervised prompt learning (UPL) methods.In this paper, we introduce a simple yet effective approach called Augmenting Discriminative Richness via Diffusions (AiR), toward learning a richer discriminating way to represent the class comprehensively and thus facilitate classification.Specifically, our approach includes a pseudo-label generation module that leverages high-fidelity synthetic samples to create an auxiliary classifier, which captures richer visual variation, bridging text-image-pair classification to a more robust image-image-pair classification. Additionally, we exploit the diversity of diffusion-based synthetic samples to enhance prompt learning, providing greater information for semantic-visual alignment.Extensive experiments on five public benchmarks, including RESISC45 and Flowers102, and across three learning paradigms-UL, SSL, and TRZSL-demonstrate that AiR achieves substantial and consistent performance improvements over state-of-the-art unsupervised prompt learning methods.

Cite

Text

Ren et al. "Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02340

Markdown

[Ren et al. "Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/ren2025cvpr-beyond/) doi:10.1109/CVPR52734.2025.02340

BibTeX

@inproceedings{ren2025cvpr-beyond,
  title     = {{Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning}},
  author    = {Ren, Hairui and Tang, Fan and Zhao, He and Wang, Zixuan and Guo, Dandan and Chang, Yi},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {25135-25144},
  doi       = {10.1109/CVPR52734.2025.02340},
  url       = {https://mlanthology.org/cvpr/2025/ren2025cvpr-beyond/}
}