ChatReID: Open-Ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
Abstract
Person re-identification (Re-ID) is a crucial task in computer vision, aiming to recognize individuals across non-overlapping camera views. While recent advanced vision-language models (VLMs) excel in logical reasoning and multi-task generalization, their applications in Re-ID tasks remain limited. They either struggle to perform accurate matching based on identity-relevant features or assist image-dominated branches as auxiliary semantics. In this paper, we propose a novel framework ChatReID, that shifts the focus towards a text-side-dominated retrieval paradigm, enabling flexible and interactive re-identification. To integrate the reasoning abilities of language models into Re-ID pipelines, We first present a large-scale instruction dataset, which contains more than 8 million prompts to promote the model fine-tuning. Next. we introduce a hierarchical progressive tuning strategy, which endows Re-ID ability through three stages of tuning, i.e., from person attribute understanding to fine-grained image retrieval and to multi-modal task reasoning.Extensive experiments across ten popular benchmarks demonstrate that ChatReID outperforms existing methods, achieving state-of-the-art performance in all Re-ID tasks. More experiments demonstrate that ChatReID not only has the ability to recognize fine-grained details but also to integrate them into a coherent reasoning process.
Cite
Text
Niu et al. "ChatReID: Open-Ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models." International Conference on Computer Vision, 2025.Markdown
[Niu et al. "ChatReID: Open-Ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/niu2025iccv-chatreid/)BibTeX
@inproceedings{niu2025iccv-chatreid,
title = {{ChatReID: Open-Ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models}},
author = {Niu, Ke and Yu, Haiyang and Zhao, Mengyang and Fu, Teng and Yi, Siyang and Lu, Wei and Li, Bin and Qian, Xuelin and Xue, Xiangyang},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {24245-24254},
url = {https://mlanthology.org/iccv/2025/niu2025iccv-chatreid/}
}