SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Chen, Yamei; Di, Yan; Zhai, Guangyao; Manhardt, Fabian; Zhang, Chenyangguang; Zhang, Ruida; Tombari, Federico; Navab, Nassir; Busam, Benjamin

doi:10.1109/CVPR52733.2024.00950

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam

CVPR 2024 pp. 9959-9969

doi:10.1109/CVPR52733.2024.00950 /cvpr/2024/chen2024cvpr-secondpose/

Abstract

Category-level object pose estimation aiming to predict the 6D pose and 3D size of objects from known categories typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of capturing this variation. To address this issue we present SecondPose a novel approach integrating object-specific geometric features with semantic category priors from DINOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object representation under SE(3) transformations facilitating the mapping from camera space to the pre-defined canonical space thus further enhancing pose estimation. Extensive experiments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover on a more complex dataset HouseCat6D which provides photometrically challenging objects SecondPose still surpasses other competitors by a large margin. Code is released at https://github.com/NOrangeeroli/SecondPose.git.

PDF CVPR Semantic Scholar

Cite

Text

Chen et al. "SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00950

Markdown

[Chen et al. "SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/chen2024cvpr-secondpose/) doi:10.1109/CVPR52733.2024.00950

BibTeX

@inproceedings{chen2024cvpr-secondpose,
  title     = {{SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation}},
  author    = {Chen, Yamei and Di, Yan and Zhai, Guangyao and Manhardt, Fabian and Zhang, Chenyangguang and Zhang, Ruida and Tombari, Federico and Navab, Nassir and Busam, Benjamin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {9959-9969},
  doi       = {10.1109/CVPR52733.2024.00950},
  url       = {https://mlanthology.org/cvpr/2024/chen2024cvpr-secondpose/}
}