InstancePose: Fast 6DoF Pose Estimation for Multiple Objects from a Single RGB Image

Abstract

6DoF object pose estimation depends on positional accuracy, implementation complexity and processing speed. This study presents a method to estimate 6DoF object poses for multi-instance object detection that requires less time and is accurate. The proposed method uses a deep neural network, which outputs 4 types of feature maps: the error object mask, semantic object masks, center vector maps (CVM) and 6D coordinate maps. These feature maps are combined in post processing to detect and estimate multi-object 2D-3D correspondences in parallel for PnP RANSAC estimation. The experiments show that the method can process input RGB images containing 7 different object categories/ instances at a speed of 25 frames per second with competitive accuracy, compared with current state-of-the-art methods, which focus only on some specific conditions.

Cite

Text

Aing et al. "InstancePose: Fast 6DoF Pose Estimation for Multiple Objects from a Single RGB Image." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00296

Markdown

[Aing et al. "InstancePose: Fast 6DoF Pose Estimation for Multiple Objects from a Single RGB Image." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/aing2021iccvw-instancepose/) doi:10.1109/ICCVW54120.2021.00296

BibTeX

@inproceedings{aing2021iccvw-instancepose,
  title     = {{InstancePose: Fast 6DoF Pose Estimation for Multiple Objects from a Single RGB Image}},
  author    = {Aing, Lee and Lie, Wen-Nung and Chiang, Jui-Chiu and Lin, Guo-Shiang},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2021},
  pages     = {2621-2630},
  doi       = {10.1109/ICCVW54120.2021.00296},
  url       = {https://mlanthology.org/iccvw/2021/aing2021iccvw-instancepose/}
}