WeLSA: Learning to Predict 6d Pose from Weakly Labeled Data Using Shape Alignment

Abstract

Object pose estimation is a crucial task in computer vision and augmented reality. One of its key challenges is the difficulty of annotation of real training data and the lack of textured CAD models. Therefore, pipelines which do not require CAD models and which can be trained with few labeled images are desirable. We propose a weakly-supervised approach for object pose estimation from RGB-D data using training sets composed of very few labeled images with pose annotations along with weakly-labeled images with ground truth segmentation masks without pose labels. We achieve this by learning to annotate weakly-labeled training data through shape alignment while simultaneously training a pose prediction network. Point cloud alignment is performed using structure and rotation-invariant feature-based losses. We further learn an implicit shape representation, which allows the method to work without the known CAD model and also contributes to pose alignment and pose refinement during training on weakly labeled images. The experimental evaluation shows that our method achieves state-of-the-art results on LineMOD, Occlusion-LineMOD and TLess despite being trained using relative poses and on only a fraction of labeled data used by the other methods. We also achieve comparable results to state-of-the-art RGB-D based pose estimation approaches even when further reducing the amount of unlabeled training data. In addition, our method works even if relative camera poses are given instead of object pose annotations which are typically easier to obtain.

Cite

Text

Vutukur et al. "WeLSA: Learning to Predict 6d Pose from Weakly Labeled Data Using Shape Alignment." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20074-8_37

Markdown

[Vutukur et al. "WeLSA: Learning to Predict 6d Pose from Weakly Labeled Data Using Shape Alignment." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/vutukur2022eccv-welsa/) doi:10.1007/978-3-031-20074-8_37

BibTeX

@inproceedings{vutukur2022eccv-welsa,
  title     = {{WeLSA: Learning to Predict 6d Pose from Weakly Labeled Data Using Shape Alignment}},
  author    = {Vutukur, Shishir Reddy and Shugurov, Ivan and Busam, Benjamin and Hutter, Andreas and Ilic, Slobodan},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20074-8_37},
  url       = {https://mlanthology.org/eccv/2022/vutukur2022eccv-welsa/}
}