Amplifying Key Cues for Human-Object-Interaction Detection

Abstract

Human-object interaction (HOI) detection aims to detect and recognise how people interact with the objects that surround them. This is challenging as different interaction categories are often distinguished only by very subtle visual differences in the scene. In this paper we introduce two methods to amplify key cues in the image, and also a method to combine these and other cues when considering the interaction between a human and an object. First, we introduce an encoding mechanism for representing the fine-grained spatial layout of the human and object (a subtle cue) and also semantic context (a cue, represented by text embeddings of surrounding objects). Second, we use plausible future movements of humans and objects as a cue to constrain the space of possible interactions. Third, we use a gate and memory architecture as a fusion module to combine the cues. We demonstrate that these three improvements lead to a performance which exceeds prior HOI methods across standard benchmarks by a considerable margin.

Cite

Text

Liu et al. "Amplifying Key Cues for Human-Object-Interaction Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58568-6_15

Markdown

[Liu et al. "Amplifying Key Cues for Human-Object-Interaction Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/liu2020eccv-amplifying/) doi:10.1007/978-3-030-58568-6_15

BibTeX

@inproceedings{liu2020eccv-amplifying,
  title     = {{Amplifying Key Cues for Human-Object-Interaction Detection}},
  author    = {Liu, Yang and Chen, Qingchao and Zisserman, Andrew},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58568-6_15},
  url       = {https://mlanthology.org/eccv/2020/liu2020eccv-amplifying/}
}