Learning Human-Object Interactions by Graph Parsing Neural Networks

Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

ECCV 2018

doi:10.1007/978-3-030-01240-3_25 /eccv/2018/qi2018eccv-learning/

Abstract

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings.

PDF ECCV Semantic Scholar

Cite

Text

Qi et al. "Learning Human-Object Interactions by Graph Parsing Neural Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01240-3_25

Markdown

[Qi et al. "Learning Human-Object Interactions by Graph Parsing Neural Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/qi2018eccv-learning/) doi:10.1007/978-3-030-01240-3_25

BibTeX

@inproceedings{qi2018eccv-learning,
  title     = {{Learning Human-Object Interactions by Graph Parsing Neural Networks}},
  author    = {Qi, Siyuan and Wang, Wenguan and Jia, Baoxiong and Shen, Jianbing and Zhu, Song-Chun},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  doi       = {10.1007/978-3-030-01240-3_25},
  url       = {https://mlanthology.org/eccv/2018/qi2018eccv-learning/}
}