Recognition from Hand Cameras: A Revisit with Deep Learning
Abstract
We revisit the study of a wrist-mounted camera system (referred to as HandCam) for recognizing activities of hands. HandCam has two unique properties as compared to egocentric systems (referred to as HeadCam): (1) it avoids the need to detect hands; (2) it more consistently observes the activities of hands. By taking advantage of these properties, we propose a deep-learning-based method to recognize hand states (free vs. active hands, hand gestures, object categories), and discover object categories. Moreover, we propose a novel two-streams deep network to further take advantage of both HandCam and HeadCam. We have collected a new synchronized HandCam and HeadCam dataset with 20 videos captured in three scenes for hand states recognition. Experiments show that our HandCam system consistently outperforms a deep-learning-based HeadCam method (with estimated manipulation regions) and a dense-trajectory-based HeadCam method in all tasks. We also show that HandCam videos captured by different users can be easily aligned to improve free vs. active recognition accuracy ( $3.3\,\%$ 3.3 % improvement) in across-scenes use case. Moreover, we observe that finetuning Convolutional Neural Network consistently improves accuracy. Finally, our novel two-streams deep network combining HandCam and HeadCam achieves the best performance in four out of five tasks. With more data, we believe a joint HandCam and HeadCam system can robustly log hand states in daily life.
Cite
Text
Chan et al. "Recognition from Hand Cameras: A Revisit with Deep Learning." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46493-0_31Markdown
[Chan et al. "Recognition from Hand Cameras: A Revisit with Deep Learning." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/chan2016eccv-recognition/) doi:10.1007/978-3-319-46493-0_31BibTeX
@inproceedings{chan2016eccv-recognition,
title = {{Recognition from Hand Cameras: A Revisit with Deep Learning}},
author = {Chan, Cheng-Sheng and Chen, Shou-Zhong and Xie, Pei-Xuan and Chang, Chiung-Chih and Sun, Min},
booktitle = {European Conference on Computer Vision},
year = {2016},
pages = {505-521},
doi = {10.1007/978-3-319-46493-0_31},
url = {https://mlanthology.org/eccv/2016/chan2016eccv-recognition/}
}