Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Yuan, Hangjie; Wang, Mang; Ni, Dong; Xu, Liangpeng

doi:10.1609/AAAI.V36I3.20229

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Hangjie Yuan, Mang Wang, Dong Ni, Liangpeng Xu

AAAI 2022 pp. 3206-3214

doi:10.1609/AAAI.V36I3.20229 /aaai/2022/yuan2022aaai-detecting/

Abstract

Human-Object Interaction (HOI) detection is an essential task to understand human-centric images from a fine-grained perspective. Although end-to-end HOI detection models thrive, their paradigm of parallel human/object detection and verb class prediction loses two-stage methods' merit: object-guided hierarchy. The object in one HOI triplet gives direct clues to the verb to be predicted. In this paper, we aim to boost end-to-end models with object-guided statistical priors. Specifically, We propose to utilize a Verb Semantic Model (VSM) and use semantic aggregation to profit from this object-guided hierarchy. Similarity KL (SKL) loss is proposed to optimize VSM to align with the HOI dataset's priors. To overcome the static semantic embedding problem, we propose to generate cross-modality-aware visual and semantic features by Cross-Modal Calibration (CMC). The above modules combined composes Object-guided Cross-modal Calibration Network (OCN). Experiments conducted on two popular HOI detection benchmarks demonstrate the significance of incorporating the statistical prior knowledge and produce state-of-the-art performances. More detailed analysis indicates proposed modules serve as a stronger verb predictor and a more superior method of utilizing prior knowledge. The codes are available at https://github.com/JacobYuan7/OCN-HOI-Benchmark.

PDF AAAI Semantic Scholar

Cite

Text

Yuan et al. "Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I3.20229

Markdown

[Yuan et al. "Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/yuan2022aaai-detecting/) doi:10.1609/AAAI.V36I3.20229

BibTeX

@inproceedings{yuan2022aaai-detecting,
  title     = {{Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics}},
  author    = {Yuan, Hangjie and Wang, Mang and Ni, Dong and Xu, Liangpeng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {3206-3214},
  doi       = {10.1609/AAAI.V36I3.20229},
  url       = {https://mlanthology.org/aaai/2022/yuan2022aaai-detecting/}
}