Scaling Human-Object Interaction Recognition Through Zero-Shot Learning
Abstract
Recognizing human object interactions (HOI) is an important part of distinguishing the rich variety of human action in the visual world. While recent progress has been made in improving HOI recognition in the fully supervised setting, the space of possible human-object interactions is large and it is impractical to obtain labeled training data for all interactions of interest. In this work, we tackle the challenge of scaling HOI recognition to the long tail of categories through a zero-shot learning approach. We introduce a factorized model for HOI detection that disentangles reasoning on verbs and objects, and at test-time can therefore produce detections for novel verb-object pairs. We present experiments on the recently introduced large-scale HICODET dataset, and show that our model is able to both perform comparably to state-of-the-art in fully-supervised HOI detection, while simultaneously achieving effective zeroshot detection of new HOI categories.
Cite
Text
Shen et al. "Scaling Human-Object Interaction Recognition Through Zero-Shot Learning." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018. doi:10.1109/WACV.2018.00181Markdown
[Shen et al. "Scaling Human-Object Interaction Recognition Through Zero-Shot Learning." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018.](https://mlanthology.org/wacv/2018/shen2018wacv-scaling/) doi:10.1109/WACV.2018.00181BibTeX
@inproceedings{shen2018wacv-scaling,
title = {{Scaling Human-Object Interaction Recognition Through Zero-Shot Learning}},
author = {Shen, Liyue and Yeung, Serena and Hoffman, Judy and Mori, Greg and Fei-Fei, Li},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2018},
pages = {1568-1576},
doi = {10.1109/WACV.2018.00181},
url = {https://mlanthology.org/wacv/2018/shen2018wacv-scaling/}
}