Generalizable Object Re-Identification via Visual In-Context Prompting

Abstract

Current object re-identification (ReID) methods train domain-specific models (e.g., for persons or vehicles), which lack generalization and demand costly labeled data for new categories. While self-supervised learning reduces annotation needs by learning instance-wise invariance, it struggles to capture identity-sensitive features critical for ReID. This paper proposes Visual In-Context Prompting (VICP), a novel framework where models trained on seen categories can directly generalize to unseen novel categories using only in-context examples as prompts, without requiring parameter adaptation. VICP synergizes LLMs and vision foundation models (VFM): LLMs infer semantic identity rules from few-shot positive/negative pairs through task-specific prompting, which then guides a VFM (e.g., DINO) to extract ID-discriminative features via dynamic visual prompts. By aligning LLM-derived semantic concepts with the VFM's pre-trained prior, VICP enables generalization to novel categories, eliminating the need for dataset-specific retraining. To support evaluation, we introduce ShopID10K, a dataset of 10K object instances from e-commerce platforms, featuring multi-view images and cross-domain testing. Experiments on ShopID10K and diverse ReID benchmarks demonstrate that VICP outperforms baselines by a clear margin on unseen categories. Code is available at https://github.com/Hzzone/VICP.

Cite

Text

Huang and Liu. "Generalizable Object Re-Identification via Visual In-Context Prompting." International Conference on Computer Vision, 2025.

Markdown

[Huang and Liu. "Generalizable Object Re-Identification via Visual In-Context Prompting." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/huang2025iccv-generalizable/)

BibTeX

@inproceedings{huang2025iccv-generalizable,
  title     = {{Generalizable Object Re-Identification via Visual In-Context Prompting}},
  author    = {Huang, Zhizhong and Liu, Xiaoming},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {22539-22550},
  url       = {https://mlanthology.org/iccv/2025/huang2025iccv-generalizable/}
}