Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
Abstract
Egocentric videos offer fine-grain information for high-fidelity modeling of human behaviors. Hands and interacting objects are one crucial aspect of understanding viewer’s behaviors and intentions. We provide a labeled dataset consisting of 11,235 egocentric images with per-pixel segmentation labels of hands and the interacting objects in diverse daily activities. Our dataset is the first to label detailed interacting hand-object contact boundaries. We introduce a context-aware compositional data augmentation technique to adapt to out-of-the-distribution YouTube egocentric video. We show that our robust hand-object segmentation model and dataset can serve as a foundation tool to boost or enable several downstream vision applications, such as: Hand state classification, video activity recognition, 3D mesh reconstruction of hand-object interaction, and Seeing through the hand with video inpainting in egocentric videos. All of our data and code will be released to the public.
Cite
Text
Zhang et al. "Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19818-2_8Markdown
[Zhang et al. "Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/zhang2022eccv-finegrained-a/) doi:10.1007/978-3-031-19818-2_8BibTeX
@inproceedings{zhang2022eccv-finegrained-a,
title = {{Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications}},
author = {Zhang, Lingzhi and Zhou, Shenghao and Stent, Simon and Shi, Jianbo},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19818-2_8},
url = {https://mlanthology.org/eccv/2022/zhang2022eccv-finegrained-a/}
}