Neural Voting Field for Camera-Space 3D Hand Pose Estimation

Huang, Lin; Lin, Chung-Ching; Lin, Kevin; Liang, Lin; Wang, Lijuan; Yuan, Junsong; Liu, Zicheng

doi:10.1109/CVPR52729.2023.00866

Neural Voting Field for Camera-Space 3D Hand Pose Estimation

Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu

CVPR 2023 pp. 8969-8978

doi:10.1109/CVPR52729.2023.00866 /cvpr/2023/huang2023cvpr-neural-a/

Abstract

We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation. As opposed to recent works, most of which first adopt holistic or pixel-level dense regression to obtain relative 3D hand pose and then follow with complex second-stage operations for 3D global root or scale recovery, we propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum. Through direct dense modeling in 3D domain inspired by Pixel-aligned Implicit Functions for 3D detailed reconstruction, our proposed Neural Voting Field (NVF) fully models 3D dense local evidence and hand global geometry, helping to alleviate common 2D-to-3D ambiguities. Specifically, for a 3D query point in camera frustum and its pixel-aligned image feature, NVF, represented by a Multi-Layer Perceptron, regresses: (i) its signed distance to the hand surface; (ii) a set of 4D offset vectors (1D voting weight and 3D directional vector to each hand joint). Following a vote-casting scheme, 4D offset vectors from near-surface points are selected to calculate the 3D hand joint coordinates by a weighted average. Experiments demonstrate that NVF outperforms existing state-of-the-art algorithms on FreiHAND dataset for camera-space 3D hand pose estimation. We also adapt NVF to the classic task of root-relative 3D hand pose estimation, for which NVF also obtains state-of-the-art results on HO3D dataset.

PDF CVPR Semantic Scholar

Cite

Text

Huang et al. "Neural Voting Field for Camera-Space 3D Hand Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00866

Markdown

[Huang et al. "Neural Voting Field for Camera-Space 3D Hand Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/huang2023cvpr-neural-a/) doi:10.1109/CVPR52729.2023.00866

BibTeX

@inproceedings{huang2023cvpr-neural-a,
  title     = {{Neural Voting Field for Camera-Space 3D Hand Pose Estimation}},
  author    = {Huang, Lin and Lin, Chung-Ching and Lin, Kevin and Liang, Lin and Wang, Lijuan and Yuan, Junsong and Liu, Zicheng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {8969-8978},
  doi       = {10.1109/CVPR52729.2023.00866},
  url       = {https://mlanthology.org/cvpr/2023/huang2023cvpr-neural-a/}
}