3D Hand Pose Estimation from RGB Using Privileged Learning with Depth Data

Abstract

This paper proposes a method for 3D hand pose estimation given a large dataset of depth images with joint annotations, and a smaller dataset of depth and RGB image pairs with joint annotations. We explore different ways of using the depth data at the training stage to improve the pose estimation accuracy of a network that only takes RGB images as input. By using paired RGB and depth images, we are able to supervise the RGB-based network to learn middle layer features that mimic that of a network trained on largescale, accurately annotated depth data. Further, depth data provides accurate foreground masks, which are employed to learn better feature activations in the RGB network. During testing, when only RGB images are available, our method produces accurate 3D hand pose predictions. The method is also shown to perform well on the 2D hand pose estimation task. We validate the approach on three public datasets, and compare it to other published methods.

Cite

Text

Yuan et al. "3D Hand Pose Estimation from RGB Using Privileged Learning with Depth Data." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00348

Markdown

[Yuan et al. "3D Hand Pose Estimation from RGB Using Privileged Learning with Depth Data." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/yuan2019iccvw-3d/) doi:10.1109/ICCVW.2019.00348

BibTeX

@inproceedings{yuan2019iccvw-3d,
  title     = {{3D Hand Pose Estimation from RGB Using Privileged Learning with Depth Data}},
  author    = {Yuan, Shanxin and Stenger, Björn and Kim, Tae-Kyun},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {2866-2873},
  doi       = {10.1109/ICCVW.2019.00348},
  url       = {https://mlanthology.org/iccvw/2019/yuan2019iccvw-3d/}
}