Learning Human-Object Interactions by Graph Parsing Neural Networks
Abstract
This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings.
Cite
Text
Qi et al. "Learning Human-Object Interactions by Graph Parsing Neural Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01240-3_25Markdown
[Qi et al. "Learning Human-Object Interactions by Graph Parsing Neural Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/qi2018eccv-learning/) doi:10.1007/978-3-030-01240-3_25BibTeX
@inproceedings{qi2018eccv-learning,
title = {{Learning Human-Object Interactions by Graph Parsing Neural Networks}},
author = {Qi, Siyuan and Wang, Wenguan and Jia, Baoxiong and Shen, Jianbing and Zhu, Song-Chun},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2018},
doi = {10.1007/978-3-030-01240-3_25},
url = {https://mlanthology.org/eccv/2018/qi2018eccv-learning/}
}