Self-Supervised 3D Hand Pose Estimation Through Training by Fitting

Abstract

We present a self-supervision method for 3D hand pose estimation from depth maps. We begin with a neural network initialized with synthesized data and fine-tune it on real but unlabelled depth maps by minimizing a set of data-fitting terms. By approximating the hand surface with a set of spheres, we design a differentiable hand renderer to align estimates by comparing the rendered and input depth maps. In addition, we place a set of priors including a data-driven term to further regulate the estimate's kinematic feasibility. Our method makes highly accurate estimates comparable to current supervised methods which require large amounts of labelled training samples, thereby advancing state-of-the-art in unsupervised learning for hand pose estimation.

Cite

Text

Wan et al. "Self-Supervised 3D Hand Pose Estimation Through Training by Fitting." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.01111

Markdown

[Wan et al. "Self-Supervised 3D Hand Pose Estimation Through Training by Fitting." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/wan2019cvpr-selfsupervised/) doi:10.1109/CVPR.2019.01111

BibTeX

@inproceedings{wan2019cvpr-selfsupervised,
  title     = {{Self-Supervised 3D Hand Pose Estimation Through Training by Fitting}},
  author    = {Wan, Chengde and Probst, Thomas and Van Gool, Luc and Yao, Angela},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.01111},
  url       = {https://mlanthology.org/cvpr/2019/wan2019cvpr-selfsupervised/}
}