Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

Luo, Jinguo; Ren, Weihong; Jiang, Weibo; Chen, Xi'ai; Wang, Qiang; Han, Zhi; Liu, Honghai

doi:10.1109/CVPR52733.2024.02665

Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

Jinguo Luo, Weihong Ren, Weibo Jiang, Xi'ai Chen, Qiang Wang, Zhi Han, Honghai Liu

CVPR 2024 pp. 28212-28222

doi:10.1109/CVPR52733.2024.02665 /cvpr/2024/luo2024cvpr-discovering/

Abstract

Recently Vision-Language Model (VLM) has greatly advanced the Human-Object Interaction (HOI) detection. The existing VLM-based HOI detectors typically adopt a hand-crafted template (e.g. a photo of a person [action] a/an [object]) to acquire text knowledge through the VLM text encoder. However such approaches only encoding the action-specific text prompts in vocabulary level may suffer from learning ambiguity without exploring the fine-grained clues from the perspective of interaction context. In this paper we propose a novel method to discover Syntactic Interaction Clues for HOI detection (SICHOI) by using VLM. Specifically we first investigate what are the essential elements for an interaction context and then establish a syntactic interaction bank from three levels: spatial relationship action-oriented posture and situational condition. Further to align visual features with the syntactic interaction bank we adopt a multi-view extractor to jointly aggregate visual features from instance interaction and image levels accordingly. In addition we also introduce a dual cross-attention decoder to perform context propagation between text knowledge and visual features thereby enhancing the HOI detection. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on HICO-DET and V-COCO.

PDF CVPR Semantic Scholar

Cite

Text

Luo et al. "Discovering Syntactic Interaction Clues for Human-Object Interaction Detection." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02665

Markdown

[Luo et al. "Discovering Syntactic Interaction Clues for Human-Object Interaction Detection." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/luo2024cvpr-discovering/) doi:10.1109/CVPR52733.2024.02665

BibTeX

@inproceedings{luo2024cvpr-discovering,
  title     = {{Discovering Syntactic Interaction Clues for Human-Object Interaction Detection}},
  author    = {Luo, Jinguo and Ren, Weihong and Jiang, Weibo and Chen, Xi'ai and Wang, Qiang and Han, Zhi and Liu, Honghai},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {28212-28222},
  doi       = {10.1109/CVPR52733.2024.02665},
  url       = {https://mlanthology.org/cvpr/2024/luo2024cvpr-discovering/}
}