Object-Centric Multiple Object Tracking

Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

ICCV 2023 pp. 16601-16611

doi:10.1109/ICCV51070.2023.01522 /iccv/2023/zhao2023iccv-objectcentric/

Abstract

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects are often split into parts and are not consistently tracked over time. In fact, state-of-the-art models achieve pixel-level accuracy and temporal consistency by relying on supervised object detection with additional ID labels for the association through time. This paper proposes a video object-centric model for MOT. It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module that builds complete object prototypes to handle occlusions. Benefited from object-centric learning, we only require sparse detection labels (0%-6.25%) for object localization and feature binding. Relying on our self-supervised Expectation-Maximization-inspired loss for object association, our approach requires no ID labels. Our experiments significantly narrow the gap between the existing object-centric model and the fully supervised state-of-the-art and outperform several unsupervised trackers that also do not require ID labels.

PDF ICCV Semantic Scholar

Cite

Text

Zhao et al. "Object-Centric Multiple Object Tracking." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01522

Markdown

[Zhao et al. "Object-Centric Multiple Object Tracking." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/zhao2023iccv-objectcentric/) doi:10.1109/ICCV51070.2023.01522

BibTeX

@inproceedings{zhao2023iccv-objectcentric,
  title     = {{Object-Centric Multiple Object Tracking}},
  author    = {Zhao, Zixu and Wang, Jiaze and Horn, Max and Ding, Yizhuo and He, Tong and Bai, Zechen and Zietlow, Dominik and Simon-Gabriel, Carl-Johann and Shuai, Bing and Tu, Zhuowen and Brox, Thomas and Schiele, Bernt and Fu, Yanwei and Locatello, Francesco and Zhang, Zheng and Xiao, Tianjun},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {16601-16611},
  doi       = {10.1109/ICCV51070.2023.01522},
  url       = {https://mlanthology.org/iccv/2023/zhao2023iccv-objectcentric/}
}