CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification

Xu, Jiyang; Wang, Qi; Xiong, Xin; Gai, Di; Zhou, Ruihua; Wang, Dong

doi:10.1609/AAAI.V39I8.32962

CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification

Jiyang Xu, Qi Wang, Xin Xiong, Di Gai, Ruihua Zhou, Dong Wang

AAAI 2025 pp. 8896-8904

doi:10.1609/AAAI.V39I8.32962 /aaai/2025/xu2025aaai-clip/

Abstract

With the emergence of vision-language pre-trained models, such as CLIP, some textual prompts have been gradually introduced recently into re-identification (Re-ID) tasks to obtain considerably robust multimodal information. However, most textual descriptions based on vehicle Re-ID tasks only contain identity index words without specific words to describe vehicle view information, thereby resulting in difficulty to be widely applied in vehicle Re-ID tasks with view variations. This case inspires us to propose a CLIP-driven view-aware prompt learning framework for unsupervised vehicle Re-ID. We first design a learnable textual prompt template called view-aware context optimization (ViewCoOp) based on dynamic multi-view word embeddings, which can fully obtain the proportion and position encoding of each view in the whole vehicle body region. Subsequently, a cross-modal mutual graph is constructed to explore the connections between inter-modal and intra-modal. Each sample is treated as a graph node, which extracts textual features based on ViewCoOp and the visual features of images. Moreover, leveraging the inter-cluster and intra-cluster correlation in the bimodal clustering results in the determination of connectivity between graph node pairs. Lastly, the proposed cross-modal mutual graph method utilizes supervised information from the bimodal gap to directly fine-tune the image encoder of CLIP for downstream unsupervised vehicle Re-ID tasks. Extensive experiments verify that the proposed method is capable of effectively obtaining cross-modal description ability from multiple views.

PDF AAAI Semantic Scholar

Cite

Text

Xu et al. "CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32962

Markdown

[Xu et al. "CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/xu2025aaai-clip/) doi:10.1609/AAAI.V39I8.32962

BibTeX

@inproceedings{xu2025aaai-clip,
  title     = {{CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification}},
  author    = {Xu, Jiyang and Wang, Qi and Xiong, Xin and Gai, Di and Zhou, Ruihua and Wang, Dong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8896-8904},
  doi       = {10.1609/AAAI.V39I8.32962},
  url       = {https://mlanthology.org/aaai/2025/xu2025aaai-clip/}
}