Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild

Abstract

We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. We train our network by gathering a large-scale dataset of hand action in YouTube videos and use it as a source of weak supervision. Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark. The dataset and additional resources are available at https://arielai.com/mesh_hands.

Cite

Text

Kulon et al. "Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00504

Markdown

[Kulon et al. "Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/kulon2020cvpr-weaklysupervised/) doi:10.1109/CVPR42600.2020.00504

BibTeX

@inproceedings{kulon2020cvpr-weaklysupervised,
  title     = {{Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild}},
  author    = {Kulon, Dominik and Guler, Riza Alp and Kokkinos, Iasonas and Bronstein, Michael M. and Zafeiriou, Stefanos},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00504},
  url       = {https://mlanthology.org/cvpr/2020/kulon2020cvpr-weaklysupervised/}
}