SpOT: Spatiotemporal Modeling for 3D Object Tracking
Abstract
3D multi-object tracking aims to uniquely and consistently identify all mobile entities through time. Despite the rich spatiotemporal information available in this setting, current 3D tracking methods primarily rely on abstracted information and limited history, e.g. single-frame object bounding boxes. In this work, we develop a holistic representation of traffic scenes that leverages both spatial and temporal information of the actors in the scene. Specifically, we reformulate tracking as a spatiotemporal problem by representing tracked objects as sequences of time-stamped points and bounding boxes over a long temporal history. At each timestamp, we improve the location and motion estimates of our tracked objects through learned refinement over the full sequence of object history. By considering time and space jointly, our representation naturally encodes fundamental physical priors such as object permanence and consistency across time. Our spatiotemporal tracking framework achieves state-of-the-art performance on the Waymo and nuScenes benchmarks.
Cite
Text
Stearns et al. "SpOT: Spatiotemporal Modeling for 3D Object Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19839-7_37Markdown
[Stearns et al. "SpOT: Spatiotemporal Modeling for 3D Object Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/stearns2022eccv-spot/) doi:10.1007/978-3-031-19839-7_37BibTeX
@inproceedings{stearns2022eccv-spot,
title = {{SpOT: Spatiotemporal Modeling for 3D Object Tracking}},
author = {Stearns, Colton and Rempe, Davis and Li, Jie and Ambruș, Rareș and Zakharov, Sergey and Guizilini, Vitor and Yang, Yanchao and Guibas, Leonidas J.},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19839-7_37},
url = {https://mlanthology.org/eccv/2022/stearns2022eccv-spot/}
}