Neural-Logic Human-Object Interaction Detection

Abstract

The interaction decoder utilized in prevalent Transformer-based HOI detectors typically accepts pre-composed human-object pairs as inputs. Though achieving remarkable performance, such a paradigm lacks feasibility and cannot explore novel combinations over entities during decoding. We present LogicHOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible interactions between. entities. Specifically, we modify. self-attention mechanism in the vanilla Transformer, enabling it to reason over the ⟨ human, action, object ⟩ triplet and constitute novel interactions. Meanwhile, such a reasoning process is guided by two crucial properties for understanding HOI: affordances (the potential actions an object can facilitate) and proxemics (the spatial relations between humans and objects). We formulate these two properties in first-order logic and ground them into continuous space to constrain the learning process of our approach, leading to improved performance and zero-shot generalization capabilities. We evaluate L OGIC HOI on V-COCO and HICO-DET under both normal and zero-shot setups, achieving significant improvements over existing methods.

Cite

Text

Li et al. "Neural-Logic Human-Object Interaction Detection." Neural Information Processing Systems, 2023.

Markdown

[Li et al. "Neural-Logic Human-Object Interaction Detection." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/li2023neurips-neurallogic/)

BibTeX

@inproceedings{li2023neurips-neurallogic,
  title     = {{Neural-Logic Human-Object Interaction Detection}},
  author    = {Li, Liulei and Wei, Jianan and Wang, Wenguan and Yang, Yi},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/li2023neurips-neurallogic/}
}