3D Video Object Detection with Learnable Object-Centric Global Optimization

Abstract

We explore long-term temporal visual correspondence-based optimization for 3D video object detection in this work. Visual correspondence refers to one-to-one mappings for pixels across multiple images. Correspondence-based optimization is the cornerstone for 3D scene reconstruction but is less studied in 3D video object detection, because moving objects violate multi-view geometry constraints and are treated as outliers during scene reconstruction. We address this issue by treating objects as first-class citizens during correspondence-based optimization. In this work, we propose BA-Det, an end-to-end optimizable object detector with object-centric temporal correspondence learning and featuremetric object bundle adjustment. Empirically, we verify the effectiveness and efficiency of BA-Det for multiple baseline 3D detectors under various setups. Our BA-Det achieves SOTA performance on the large-scale Waymo Open Dataset (WOD) with only marginal computation cost. Our code is available at https://github.com/jiaweihe1996/BA-Det.

Cite

Text

He et al. "3D Video Object Detection with Learnable Object-Centric Global Optimization." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00494

Markdown

[He et al. "3D Video Object Detection with Learnable Object-Centric Global Optimization." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/he2023cvpr-3d/) doi:10.1109/CVPR52729.2023.00494

BibTeX

@inproceedings{he2023cvpr-3d,
  title     = {{3D Video Object Detection with Learnable Object-Centric Global Optimization}},
  author    = {He, Jiawei and Chen, Yuntao and Wang, Naiyan and Zhang, Zhaoxiang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {5106-5115},
  doi       = {10.1109/CVPR52729.2023.00494},
  url       = {https://mlanthology.org/cvpr/2023/he2023cvpr-3d/}
}