Dimension Embeddings for Monocular 3D Object Detection

Abstract

Most existing deep learning-based approaches for monocular 3D object detection directly regress the dimensions of objects and overlook their importance in solving the ill-posed problem. In this paper, we propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection. Specifically, we consider two intuitive clues in learning the dimension-aware embeddings with deep neural networks. First, we constrain the pair-wise distance on the embedding space to reflect the similarity of corresponding dimensions so that the model can take advantage of inter-object information to learn more discriminative embeddings for dimension estimation. Second, we propose to learn representative shape templates on the dimension-aware embedding space. Through the attention mechanism, each object can interact with the learnable templates and obtain the attentive dimensions as the initial estimation, which is further refined by the combined features from both the object and the attentive templates. Experimental results on the well-established KITTI dataset demonstrate the proposed method of dimension embeddings can bring consistent improvements with negligible computation cost overhead. We achieve new state-of-the-art performance on the KITTI 3D object detection benchmark.

Cite

Text

Zhang et al. "Dimension Embeddings for Monocular 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00164

Markdown

[Zhang et al. "Dimension Embeddings for Monocular 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhang2022cvpr-dimension/) doi:10.1109/CVPR52688.2022.00164

BibTeX

@inproceedings{zhang2022cvpr-dimension,
  title     = {{Dimension Embeddings for Monocular 3D Object Detection}},
  author    = {Zhang, Yunpeng and Zheng, Wenzhao and Zhu, Zheng and Huang, Guan and Du, Dalong and Zhou, Jie and Lu, Jiwen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {1589-1598},
  doi       = {10.1109/CVPR52688.2022.00164},
  url       = {https://mlanthology.org/cvpr/2022/zhang2022cvpr-dimension/}
}