Learning to Detect Human-Object Interactions

Abstract

We study the problem of detecting human-object interactions (HOI) in static images, defined as predicting a human and an object bounding box with an interaction class label that connects them. HOI detection is a fundamental problem in computer vision as it provides semantic information about the interactions among the detected objects. We introduce HICO-DET, a new large benchmark for HOI detection, by augmenting the current HICO classification benchmark with instance annotations. To solve the task, we propose Human-Object Region-based Convolutional Neural Networks (HO-RCNN). At the core of our HO-RCNN is the Interaction Pattern, a novel DNN input that characterizes the spatial relations between two bounding boxes. Experiments on HICO-DET demonstrate that our HO-RCNN, by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches.

Cite

Text

Chao et al. "Learning to Detect Human-Object Interactions." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018. doi:10.1109/WACV.2018.00048

Markdown

[Chao et al. "Learning to Detect Human-Object Interactions." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018.](https://mlanthology.org/wacv/2018/chao2018wacv-learning/) doi:10.1109/WACV.2018.00048

BibTeX

@inproceedings{chao2018wacv-learning,
  title     = {{Learning to Detect Human-Object Interactions}},
  author    = {Chao, Yu-Wei and Liu, Yunfan and Liu, Xieyang and Zeng, Huayi and Deng, Jia},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2018},
  pages     = {381-389},
  doi       = {10.1109/WACV.2018.00048},
  url       = {https://mlanthology.org/wacv/2018/chao2018wacv-learning/}
}