VI-Net: Boosting Category-Level 6d Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations

Abstract

Rotation estimation of high precision from an RGB-D object observation is a huge challenge in 6D object pose estimation, due to the difficulty of learning in the non-linear space of SO(3). In this paper, we propose a novel rotation estimation network, termed as VI-Net, to make the task easier by decoupling the rotation as the combination of a viewpoint rotation and an in-plane rotation. More specifically, VI-Net bases the feature learning on the sphere with two individual branches for the estimates of two factorized rotations, where a V-Branch is employed to learn the viewpoint rotation via binary classification on the spherical signals, while another I-Branch is used to estimate the in-plane rotation by transforming the signals to view from the zenith direction. To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution (SPA-SConv), which settles the boundary problem of spherical signals via feature padding and realizesviewpoint-equivariant feature extraction by symmetric convolutional operations. We apply the proposed VI-Net to the challenging task of category-level 6D object pose estimation for predicting the poses of unknown objects without available CAD models; experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.

Cite

Text

Lin et al. "VI-Net: Boosting Category-Level 6d Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01287

Markdown

[Lin et al. "VI-Net: Boosting Category-Level 6d Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/lin2023iccv-vinet/) doi:10.1109/ICCV51070.2023.01287

BibTeX

@inproceedings{lin2023iccv-vinet,
  title     = {{VI-Net: Boosting Category-Level 6d Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations}},
  author    = {Lin, Jiehong and Wei, Zewei and Zhang, Yabin and Jia, Kui},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {14001-14011},
  doi       = {10.1109/ICCV51070.2023.01287},
  url       = {https://mlanthology.org/iccv/2023/lin2023iccv-vinet/}
}