HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
Abstract
We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction. HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 9 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms. Frame-wise annotations for panoptic segmentation, motion segmentation, 3D hand pose, category-level object pose and hand action have also been provided, together with reconstructed object meshes and scene point clouds. With HOI4D, we establish three benchmarking tasks to promote category-level HOI from 4D visual signals including semantic segmentation of 4D dynamic point cloud sequences, category-level object pose tracking, and egocentric action segmentation with diverse interaction targets. In-depth analysis shows HOI4D poses great challenges to existing methods and produces huge research opportunities.
Cite
Text
Liu et al. "HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.02034Markdown
[Liu et al. "HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/liu2022cvpr-hoi4d/) doi:10.1109/CVPR52688.2022.02034BibTeX
@inproceedings{liu2022cvpr-hoi4d,
title = {{HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction}},
author = {Liu, Yunze and Liu, Yun and Jiang, Che and Lyu, Kangbo and Wan, Weikang and Shen, Hao and Liang, Boqiang and Fu, Zhoujie and Wang, He and Yi, Li},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {21013-21022},
doi = {10.1109/CVPR52688.2022.02034},
url = {https://mlanthology.org/cvpr/2022/liu2022cvpr-hoi4d/}
}