CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification
Abstract
With the emergence of vision-language pre-trained models, such as CLIP, some textual prompts have been gradually introduced recently into re-identification (Re-ID) tasks to obtain considerably robust multimodal information. However, most textual descriptions based on vehicle Re-ID tasks only contain identity index words without specific words to describe vehicle view information, thereby resulting in difficulty to be widely applied in vehicle Re-ID tasks with view variations. This case inspires us to propose a CLIP-driven view-aware prompt learning framework for unsupervised vehicle Re-ID. We first design a learnable textual prompt template called view-aware context optimization (ViewCoOp) based on dynamic multi-view word embeddings, which can fully obtain the proportion and position encoding of each view in the whole vehicle body region. Subsequently, a cross-modal mutual graph is constructed to explore the connections between inter-modal and intra-modal. Each sample is treated as a graph node, which extracts textual features based on ViewCoOp and the visual features of images. Moreover, leveraging the inter-cluster and intra-cluster correlation in the bimodal clustering results in the determination of connectivity between graph node pairs. Lastly, the proposed cross-modal mutual graph method utilizes supervised information from the bimodal gap to directly fine-tune the image encoder of CLIP for downstream unsupervised vehicle Re-ID tasks. Extensive experiments verify that the proposed method is capable of effectively obtaining cross-modal description ability from multiple views.
Cite
Text
Xu et al. "CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32962Markdown
[Xu et al. "CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/xu2025aaai-clip/) doi:10.1609/AAAI.V39I8.32962BibTeX
@inproceedings{xu2025aaai-clip,
title = {{CLIP-Driven View-Aware Prompt Learning for Unsupervised Vehicle Re-Identification}},
author = {Xu, Jiyang and Wang, Qi and Xiong, Xin and Gai, Di and Zhou, Ruihua and Wang, Dong},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {8896-8904},
doi = {10.1609/AAAI.V39I8.32962},
url = {https://mlanthology.org/aaai/2025/xu2025aaai-clip/}
}