OSOP: A Multi-Stage One Shot Object Pose Estimation Framework
Abstract
We present a novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects. At test time, it takes as input a target image and a textured 3D query model. The core idea is to represent a 3D model with a number of 2D templates rendered from different viewpoints. This enables CNN-based direct dense feature extraction and matching. The object is first localized in 2D, then its approximate viewpoint is estimated, followed by dense 2D-3D correspondence prediction. The final pose is computed with PnP. We evaluate the method on LineMOD, Occlusion, Homebrewed, YCB-V and TLESS datasets and report very competitive performance in comparison to the state-of-the-art methods trained on synthetic data, even though our method is not trained on the object models used for testing.
Cite
Text
Shugurov et al. "OSOP: A Multi-Stage One Shot Object Pose Estimation Framework." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00671Markdown
[Shugurov et al. "OSOP: A Multi-Stage One Shot Object Pose Estimation Framework." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/shugurov2022cvpr-osop/) doi:10.1109/CVPR52688.2022.00671BibTeX
@inproceedings{shugurov2022cvpr-osop,
title = {{OSOP: A Multi-Stage One Shot Object Pose Estimation Framework}},
author = {Shugurov, Ivan and Li, Fu and Busam, Benjamin and Ilic, Slobodan},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {6835-6844},
doi = {10.1109/CVPR52688.2022.00671},
url = {https://mlanthology.org/cvpr/2022/shugurov2022cvpr-osop/}
}