Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-Training

Abstract

3D single object tracking (SOT) is an essential task in autonomous driving and robotics. However, learning robust 3D SOT trackers remains challenging due to the limited category-specific point cloud data and the inherent sparsity and incompleteness of LiDAR scans. To tackle these issues, we propose a unified 3D SOT framework that leverages 3D generative pre-training and learns robust 3D matching abilities from 2D pre-trained foundation trackers. Our framework features a consistent target-matching architecture with the widely used 2D trackers, facilitating the transfer of 2D matching knowledge. Specifically, we first propose a lightweight Target-Aware Projection (TAP) module, allowing the pre-trained 2D tracker to work well on the projected point clouds without further fine-tuning. We then propose a novel IoU-guided matching-distillation framework that utilizes the powerful 2D pre-trained trackers to guide 3D matching learning in the 3D tracker, i.e., the 3D template-to-search matching should be consistent with its corresponding 2D template-to-search matching obtained from 2D pre-trained trackers. Our designs are applied to two mainstream 3D SOT frameworks: memory-less Siamese and contextual memory-based approaches, which are respectively named SiamDisst and MemDisst. Extensive experiments show that SiamDisst and MemDisst achieve state-of-the-art performance on KITTI, Waymo Open Dataset and nuScenes benchmarks, while running at above real-time speed of 25 and 90 FPS on a RTX3090 GPU.

Cite

Text

Wu et al. "Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-Training." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73254-6_16

Markdown

[Wu et al. "Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-Training." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/wu2024eccv-boosting/) doi:10.1007/978-3-031-73254-6_16

BibTeX

@inproceedings{wu2024eccv-boosting,
  title     = {{Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-Training}},
  author    = {Wu, Qiangqiang and Xia, Yan and Wan, Jia and Chan, Antoni},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73254-6_16},
  url       = {https://mlanthology.org/eccv/2024/wu2024eccv-boosting/}
}