Trans6D: Transformer-Based 6d Object Pose Estimation and Refinement

Zhang, Zhongqun; Chen, Wei; Zheng, Linfang; Leonardis, Ales; Chang, Hyung Jin

doi:10.1007/978-3-031-25085-9_7

Trans6D: Transformer-Based 6d Object Pose Estimation and Refinement

Zhongqun Zhang, Wei Chen, Linfang Zheng, Ales Leonardis, Hyung Jin Chang

ECCVW 2022 pp. 112-128

doi:10.1007/978-3-031-25085-9_7 /eccvw/2022/zhang2022eccvw-trans6d/

Abstract

Estimating 6D object pose from a monocular RGB image remains challenging due to factors such as texture-less and occlusion. Although convolution neural network (CNN)-based methods have made remarkable progress, they are not efficient in capturing global dependencies and often suffer from information loss due to downsampling operations. To extract robust feature representation, we propose a Transformer-based 6D object pose estimation approach (Trans6D). Specifically, we first build two transformer-based strong baselines and compare their performance: pure Transformers following the ViT (Trans6D-pure) and hybrid Transformers integrating CNNs with Transformers (Trans6D-hybrid). Furthermore, two novel modules have been proposed to make the Trans6D-pure more accurate and robust: (i) a patch-aware feature fusion module. It decreases the number of tokens without information loss via shifted windows, cross-attention, and token pooling operations, which is used to predict dense 2D-3D correspondence maps; (ii) a pure Transformer-based pose refinement module (Trans6D+) which refines the estimated poses iteratively. Extensive experiments show that the proposed approach achieves state-of-the-art performances on two datasets.

PDF ECCVW Semantic Scholar

Cite

Text

Zhang et al. "Trans6D: Transformer-Based 6d Object Pose Estimation and Refinement." European Conference on Computer Vision Workshops, 2022. doi:10.1007/978-3-031-25085-9_7

Markdown

[Zhang et al. "Trans6D: Transformer-Based 6d Object Pose Estimation and Refinement." European Conference on Computer Vision Workshops, 2022.](https://mlanthology.org/eccvw/2022/zhang2022eccvw-trans6d/) doi:10.1007/978-3-031-25085-9_7

BibTeX

@inproceedings{zhang2022eccvw-trans6d,
  title     = {{Trans6D: Transformer-Based 6d Object Pose Estimation and Refinement}},
  author    = {Zhang, Zhongqun and Chen, Wei and Zheng, Linfang and Leonardis, Ales and Chang, Hyung Jin},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2022},
  pages     = {112-128},
  doi       = {10.1007/978-3-031-25085-9_7},
  url       = {https://mlanthology.org/eccvw/2022/zhang2022eccvw-trans6d/}
}