Monocular 3D Localization of Vehicles in Road Scenes

Abstract

Sensing and perception systems for autonomous driving vehicles in road scenes are composed of three crucial components: 3D-based object detection, tracking, and localization. While all three components are important, most relevant papers tend to only focus on one single component. We propose a monocular vision-based framework for 3D-based detection, tracking, and localization by effectively integrating all three tasks in a complementary manner. Our system contains an RCNN-based Localization Network (LOCNet), which works in concert with fitness evaluation score (FES) based single-frame optimization, to get more accurate and refined 3D vehicle localization. To better utilize the temporal information, we further use a multi-frame optimization technique, taking advantage of camera ego-motion and a 3D TrackletNet Tracker (3D TNT), to improve both accuracy and consistency in our 3D localization results. Our system outperforms state-of-the-art image-based solutions in diverse scenarios and is even comparable with LiDAR-based methods.

Cite

Text

Zhang et al. "Monocular 3D Localization of Vehicles in Road Scenes." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00320

Markdown

[Zhang et al. "Monocular 3D Localization of Vehicles in Road Scenes." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/zhang2021iccvw-monocular/) doi:10.1109/ICCVW54120.2021.00320

BibTeX

@inproceedings{zhang2021iccvw-monocular,
  title     = {{Monocular 3D Localization of Vehicles in Road Scenes}},
  author    = {Zhang, Haotian and Ji, Haorui and Zheng, Aotian and Hwang, Jenq-Neng and Hwang, Ren-Hung},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2021},
  pages     = {2855-2864},
  doi       = {10.1109/ICCVW54120.2021.00320},
  url       = {https://mlanthology.org/iccvw/2021/zhang2021iccvw-monocular/}
}