SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
Abstract
Monocular 3D reconstruction for categorical objects heavily relies on accurately perceiving each object’s pose. While gradient-based optimization in a NeRF framework updates the initial pose, this paper highlights that scale-depth ambiguity in monocular object reconstruction causes failures when the initial pose deviates moderately from the true pose. Consequently, existing methods often depend on a third-party 3D object to provide an initial object pose, leading to increased complexity and generalization issues. To address these challenges, we present SUP-NeRF, a Streamlined Unification of object Pose estimation and NeRF-based object reconstruction. SUP-NeRF decouples the object’s dimension estimation and pose refinement to resolve the scale-depth ambiguity, and introduces a camera-invariant projected-box representation that generalizes cross different domains. While using a dedicated pose estimator that smoothly integrates into an object-centric NeRF, SUP-NeRF is free from external 3D detectors. SUP-NeRF achieves state-of-the-art results in both reconstruction and pose estimation tasks on the nuScenes dataset. Furthermore, SUP-NeRF exhibits exceptional cross-dataset generalization on the KITTI and Waymo datasets, surpassing prior methods with up to 50% reduction in rotation and translation error.
Cite
Text
Guo et al. "SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72890-7_3Markdown
[Guo et al. "SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/guo2024eccv-supnerf/) doi:10.1007/978-3-031-72890-7_3BibTeX
@inproceedings{guo2024eccv-supnerf,
title = {{SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction}},
author = {Guo, Yuliang and Kumar, Abhinav and Zhao, Cheng and Wang, Ruoyu and Huang, Xinyu and Ren, Liu},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72890-7_3},
url = {https://mlanthology.org/eccv/2024/guo2024eccv-supnerf/}
}