SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Abstract
Category-level object pose estimation aiming to predict the 6D pose and 3D size of objects from known categories typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of capturing this variation. To address this issue we present SecondPose a novel approach integrating object-specific geometric features with semantic category priors from DINOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object representation under SE(3) transformations facilitating the mapping from camera space to the pre-defined canonical space thus further enhancing pose estimation. Extensive experiments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover on a more complex dataset HouseCat6D which provides photometrically challenging objects SecondPose still surpasses other competitors by a large margin. Code is released at https://github.com/NOrangeeroli/SecondPose.git.
Cite
Text
Chen et al. "SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00950Markdown
[Chen et al. "SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/chen2024cvpr-secondpose/) doi:10.1109/CVPR52733.2024.00950BibTeX
@inproceedings{chen2024cvpr-secondpose,
title = {{SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation}},
author = {Chen, Yamei and Di, Yan and Zhai, Guangyao and Manhardt, Fabian and Zhang, Chenyangguang and Zhang, Ruida and Tombari, Federico and Navab, Nassir and Busam, Benjamin},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {9959-9969},
doi = {10.1109/CVPR52733.2024.00950},
url = {https://mlanthology.org/cvpr/2024/chen2024cvpr-secondpose/}
}