Tracking Objects as Pixel-Wise Distributions

Abstract

Multi-object tracking (MOT) requires detecting and associating objects through frames. Unlike tracking via detected bounding boxes or center points, we propose tracking objects as pixel-wise distributions. We instantiate this idea on a transformer-based architecture named P3AFormer, with pixel-wise propagation, prediction, and association. P3AFormer propagates pixel-wise features guided by flow information to pass messages between frames. Further, P3AFormer adopts a meta-architecture to produce multi-scale object feature maps. During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark -- highest among all transformer networks to reach 80\% MOTA in literature. P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks. The code is at https://github.com/dvlab-research/ECCV22-P3AFormer-Tracking-Objects-as-Pixel-wise-Distributions.

Cite

Text

Zhao et al. "Tracking Objects as Pixel-Wise Distributions." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20047-2_5

Markdown

[Zhao et al. "Tracking Objects as Pixel-Wise Distributions." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/zhao2022eccv-tracking/) doi:10.1007/978-3-031-20047-2_5

BibTeX

@inproceedings{zhao2022eccv-tracking,
  title     = {{Tracking Objects as Pixel-Wise Distributions}},
  author    = {Zhao, Zelin and Wu, Ze and Zhuang, Yueqing and Li, Boxun and Jia, Jiaya},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20047-2_5},
  url       = {https://mlanthology.org/eccv/2022/zhao2022eccv-tracking/}
}