DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences

Abstract

Camera-based Bird’s-Eye-View (BEV) perception often struggles between adopting 3D-to-2D or 2D-to-3D view transformation (VT). The 3D-to-2D VT typically employs resource-intensive Transformer to establish robust correspondences between 3D and 2D features, while the 2D-to-3D VT utilizes the Lift-Splat-Shoot (LSS) pipeline for real-time application, potentially missing distant information. To address these limitations, we propose DualBEV, a unified framework that utilizes a shared feature transformation incorporating three probabilistic measurements for both strategies. By considering dual-view correspondences in one stage, DualBEV effectively bridges the gap between these strategies, harnessing their individual strengths. Our method achieves state-of-the-art performance without Transformer, delivering comparable efficiency to the LSS approach, with 55.2% mAP and 63.4% NDS on the nuScenes test set. Code is available at https: //github.com/PeidongLi/DualBEV.

Cite

Text

Li et al. "DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72907-2_17

Markdown

[Li et al. "DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/li2024eccv-dualbev/) doi:10.1007/978-3-031-72907-2_17

BibTeX

@inproceedings{li2024eccv-dualbev,
  title     = {{DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences}},
  author    = {Li, Peidong and Shen, Wancheng and Huang, Qihao and Cui, Dixiao},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72907-2_17},
  url       = {https://mlanthology.org/eccv/2024/li2024eccv-dualbev/}
}