TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses

Abstract

3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots. With the commonly used tracking-by-detection paradigm, 3D MOT has made important progress in recent years. However, these methods only use the detection boxes of the current frame to obtain trajectory-box association results, which makes it impossible for the tracker to recover objects missed by the detector. In this paper, we present TrajectoryFormer, a novel point-cloud-based 3D MOT framework. To recover the missed object by detector, we generates multiple trajectory hypotheses with hybrid candidate boxes, including temporally predicted boxes and currentframe detection boxes, for trajectory-box association. The predicted boxes can propagate object's history trajectory information to the current frame and thus the network can tolerate short-term miss detection of the tracked objects. We combine long-term object motion feature and short-term object appearance feature to create per-hypothesis feature embedding, which reduces the computational overhead for spatial-temporal encoding. Additionally, we introduce a Global-Local Interaction Module to conduct information interaction among all hypotheses and models their spatial relations, leading to accurate estimation of hypotheses. Our TrajectoryFormer achieves state-of-the-art performance on the Waymo 3D MOT benchmarks.

Cite

Text

Chen et al. "TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01698

Markdown

[Chen et al. "TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/chen2023iccv-trajectoryformer/) doi:10.1109/ICCV51070.2023.01698

BibTeX

@inproceedings{chen2023iccv-trajectoryformer,
  title     = {{TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses}},
  author    = {Chen, Xuesong and Shi, Shaoshuai and Zhang, Chao and Zhu, Benjin and Wang, Qiang and Cheung, Ka Chun and See, Simon and Li, Hongsheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {18527-18536},
  doi       = {10.1109/ICCV51070.2023.01698},
  url       = {https://mlanthology.org/iccv/2023/chen2023iccv-trajectoryformer/}
}