Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction
Abstract
Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction","We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators~(HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS'19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand models to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.
Cite
Text
Armagan et al. "Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58592-1_6Markdown
[Armagan et al. "Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/armagan2020eccv-measuring/) doi:10.1007/978-3-030-58592-1_6BibTeX
@inproceedings{armagan2020eccv-measuring,
title = {{Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction}},
author = {Armagan, Anil and Garcia-Hernando, Guillermo and Baek, Seungryul and Hampali, Shreyas and Rad, Mahdi and Zhang, Zhaohui and Xie, Shipeng and Chen, MingXiu and Zhang, Boshen and Xiong, Fu and Xiao, Yang and Cao, Zhiguo and Yuan, Junsong and Ren, Pengfei and Huang, Weiting and Sun, Haifeng and Hrúz, Marek and Kanis, Jakub and Krňoul, Zdeněk and Wan, Qingfu and Li, Shile and Yang, Linlin and Lee, Dongheui and Yao, Angela and Zhou, Weiguo and Mei, Sijia and Liu, Yunhui and Spurr, Adrian and Iqbal, Umar and Molchanov, Pavlo and Weinzaepfel, Philippe and Brégier, Romain and Rogez, Grégory and Lepetit, Vincent and Kim, Tae-Kyun},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58592-1_6},
url = {https://mlanthology.org/eccv/2020/armagan2020eccv-measuring/}
}