Exploring Pose-Aware Human-Object Interaction via Hybrid Learning

Abstract

Human-Object Interaction (HOI) detection plays a crucial role in visual scene comprehension. In recent advancements two-stage detectors have taken a prominent position. However they are encumbered by two primary challenges. First the misalignment between feature representation and relation reasoning gives rise to a deficiency in discriminative features crucial for interaction detection. Second due to sparse annotation the second-stage interaction head generates numerous candidate <human object> pairs with only a small fraction receiving supervision. Towards these issues we propose a hybrid learning method based on pose-aware HOI feature refinement. Specifically we devise pose-aware feature refinement that encodes spatial features by considering human body pose characteristics. It can direct attention towards key regions ultimately offering a wealth of fine-grained features imperative for HOI detection. Further we introduce a hybrid learning method that combines HOI triplets with probabilistic soft labels supervision which is regenerated from decoupled verb-object pairs. This method explores the implicit connections between the interactions enhancing model generalization without requiring additional data. Our method establishes state-of-the-art performance on HICO-DET benchmark and excels notably in detecting rare HOIs.

Cite

Text

Wu et al. "Exploring Pose-Aware Human-Object Interaction via Hybrid Learning." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01687

Markdown

[Wu et al. "Exploring Pose-Aware Human-Object Interaction via Hybrid Learning." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/wu2024cvpr-exploring/) doi:10.1109/CVPR52733.2024.01687

BibTeX

@inproceedings{wu2024cvpr-exploring,
  title     = {{Exploring Pose-Aware Human-Object Interaction via Hybrid Learning}},
  author    = {Wu, Eastman Z Y and Li, Yali and Wang, Yuan and Wang, Shengjin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {17815-17825},
  doi       = {10.1109/CVPR52733.2024.01687},
  url       = {https://mlanthology.org/cvpr/2024/wu2024cvpr-exploring/}
}