PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
Abstract
In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. More specifically, we extend the 3D position embedding (3D PE) in PETR for temporal modeling. The 3D PE achieves the temporal alignment on object position of different frames. To support for multi-task learning (e.g., BEV segmentation and 3D lane detection), PETRv2 provides a simple yet effective solution by introducing task-specific queries, which are initialized under different spaces. PETRv2 achieves state-of-the-art performance on 3D object detection, BEV segmentation and 3D lane detection. Detailed robustness analysis is also conducted on PETR framework. Code is available at https://github.com/megvii-research/PETR.
Cite
Text
Liu et al. "PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00302Markdown
[Liu et al. "PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/liu2023iccv-petrv2/) doi:10.1109/ICCV51070.2023.00302BibTeX
@inproceedings{liu2023iccv-petrv2,
title = {{PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images}},
author = {Liu, Yingfei and Yan, Junjie and Jia, Fan and Li, Shuailin and Gao, Aqi and Wang, Tiancai and Zhang, Xiangyu},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {3262-3272},
doi = {10.1109/ICCV51070.2023.00302},
url = {https://mlanthology.org/iccv/2023/liu2023iccv-petrv2/}
}