ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking

Abstract

Tracking multiple objects based on textual queries is a challenging task that requires linking language understanding with object association across frames. Previous works typically train the whole process end-to-end or integrate an additional referring text module into a multi-object tracker, but they both require supervised training and potentially struggle with generalization to open-set queries. In this work, we introduce ReferGPT, a novel zero-shot referring multi-object tracking framework. We provide a multi-modal large language model (MLLM) with spatial knowledge enabling it to generate 3D-aware captions. This enhances its descriptive capabilities and supports a more flexible referring vocabulary without training. We also propose a robust query-matching strategy, leveraging CLIP-based semantic encoding and fuzzy matching to associate MLLM generated captions with user queries. Extensive experiments on Refer-KITTI, Refer-KITTIv2 and Refer-KITTI+ demonstrate that ReferGPT achieves competitive performance against trained methods, showcasing its robustness and zero-shot capabilities in autonomous driving. The codes will be publicly available on github.

Cite

Text

Chamiti et al. "ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Chamiti et al. "ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/chamiti2025cvprw-refergpt/)

BibTeX

@inproceedings{chamiti2025cvprw-refergpt,
  title     = {{ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking}},
  author    = {Chamiti, Tzoulio and Di Bella, Leandro and Munteanu, Adrian and Deligiannis, Nikos},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {3849-3858},
  url       = {https://mlanthology.org/cvprw/2025/chamiti2025cvprw-refergpt/}
}