PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Abstract

In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. Till submission, PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research.

Cite

Text

Liu et al. "PETR: Position Embedding Transformation for Multi-View 3D Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19812-0_31

Markdown

[Liu et al. "PETR: Position Embedding Transformation for Multi-View 3D Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/liu2022eccv-petr/) doi:10.1007/978-3-031-19812-0_31

BibTeX

@inproceedings{liu2022eccv-petr,
  title     = {{PETR: Position Embedding Transformation for Multi-View 3D Object Detection}},
  author    = {Liu, Yingfei and Wang, Tiancai and Zhang, Xiangyu and Sun, Jian},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19812-0_31},
  url       = {https://mlanthology.org/eccv/2022/liu2022eccv-petr/}
}