HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation

Abstract

Despite the substantial progress in 3D hand pose estimation, inferring plausible and accurate poses in the presence of severe self-occlusion and high self-similarity remains an inherent challenge. To mitigate the ambiguity arising from invisible and similar joints, we propose a novel Topology-aware Transformer network named HandGCNFormer, incorporating the prior knowledge of hand kinematic topology into the network while modeling long-range context information. Specifically, we present a novel Graphformer decoder with an additional node-offset graph convolutional layer (NoffGConv) that optimizes the synergy of Transformer and GCN, capturing long-range dependencies as well as local topology connection between joints. Furthermore, we replace the standard MLP prediction head with a novel Topology-aware head to better utilize local topology constraints for more plausible and accurate poses. Our method achieves state-of-the-art performance on four challenging datasets including Hands2017, NYU, ICVL, and MSRA.

Cite

Text

Wang et al. "HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation." Winter Conference on Applications of Computer Vision, 2023.

Markdown

[Wang et al. "HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/wang2023wacv-handgcnformer/)

BibTeX

@inproceedings{wang2023wacv-handgcnformer,
  title     = {{HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation}},
  author    = {Wang, Yintong and Chen, LiLi and Li, Jiamao and Zhang, Xiaolin},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2023},
  pages     = {5675-5684},
  url       = {https://mlanthology.org/wacv/2023/wang2023wacv-handgcnformer/}
}