HFF-Tracker: A Hierarchical Fine-Grained Fusion Tracker for Referring Multi-Object Tracking

Zhao, Zeyong; Hao, Yanchao; Zhang, Minghao; Liu, Qingbin; Li, Bo; Sui, Dianbo; He, Shizhu; Chen, Xi

doi:10.1609/AAAI.V39I10.33143

HFF-Tracker: A Hierarchical Fine-Grained Fusion Tracker for Referring Multi-Object Tracking

Zeyong Zhao, Yanchao Hao, Minghao Zhang, Qingbin Liu, Bo Li, Dianbo Sui, Shizhu He, Xi Chen

AAAI 2025 pp. 10528-10536

doi:10.1609/AAAI.V39I10.33143 /aaai/2025/zhao2025aaai-hff/

Abstract

Referring Multi-Object Tracking (RMOT) aims to track multiple objects based on a provided language expression. Although prior studies have sought to accomplish this by integrating an textual module into the multi-object tracker, these methods combine text and image features in a basic way, neglecting the importance of text features. In this study, we propose a Hierarchical Fine-grained text-image Fusion tracker, named HFF-Tracker, which can perform fine-grained fusion of pixel-level visual features and text features across various semantic levels. Specifically, we have devised a Hierarchical Multi-Modal Fusion (HMMF) module to merge text and image features at an early stage in a hierarchical and detailed manner. The Text-Guided Decoder (TGD) is designed to provide the query with prior semantic information during the decoding process. Additionally, we have crafted a Text-Guided Prediction Head (TGPH) that utilizes text information to enhance the performance of the prediction head. Furthermore, we have implemented an adaptive Look-Back training strategy to maximize the utilization of valuable labeled data. Extensive experiments on the Refer-KITTI dataset and the Refer-KITTI-V2 dataset demonstrate that our proposed HFF-Tracker outperforms other state-of-the-art methods with remarkable margins.

PDF AAAI Semantic Scholar

Cite

Text

Zhao et al. "HFF-Tracker: A Hierarchical Fine-Grained Fusion Tracker for Referring Multi-Object Tracking." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33143

Markdown

[Zhao et al. "HFF-Tracker: A Hierarchical Fine-Grained Fusion Tracker for Referring Multi-Object Tracking." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhao2025aaai-hff/) doi:10.1609/AAAI.V39I10.33143

BibTeX

@inproceedings{zhao2025aaai-hff,
  title     = {{HFF-Tracker: A Hierarchical Fine-Grained Fusion Tracker for Referring Multi-Object Tracking}},
  author    = {Zhao, Zeyong and Hao, Yanchao and Zhang, Minghao and Liu, Qingbin and Li, Bo and Sui, Dianbo and He, Shizhu and Chen, Xi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {10528-10536},
  doi       = {10.1609/AAAI.V39I10.33143},
  url       = {https://mlanthology.org/aaai/2025/zhao2025aaai-hff/}
}