Detecting Human-Object Interaction with Mixed Supervision

Abstract

Human object interaction (HOI) detection is an important task in image understanding and reasoning. It is in a form of HOI triplet<human,verb,object> , requiring bounding boxes for humans and objects, and action be-tween them for the task completion. In other words, this task requires strong supervision for training, which is how-ever hard to procure. A natural solution to overcome this is to pursue weakly-supervised learning, where we only know the presence of certain HOI triplets in images but their ex-act location is unknown. Most weakly-supervised learning methods do not make provision for leveraging data with strong supervision, when they are available; and indeed a naive combination of this two paradigms in HOI detection fails to make contributions to each other. In this regard we propose a mixed-supervised HOI detection pipeline: thanks to a specific design of momentum-independent learning, it learns seamlessly across these two types of supervision. Moreover, in light of the annotation insufficiency in mixed supervision, we introduce an HOI element swap-ping technique to synthesize diverse and hard negatives across images and improve the robustness of the model. Our method is evaluated on the challenging HICO-DET dataset. It outperforms the state of the art weakly- and fully-supervised methods under the same setting; and performs close to or even better than many fully-supervised methods by using a mixed amount of full and weak supervision.

Cite

Text

Kumaraswamy et al. "Detecting Human-Object Interaction with Mixed Supervision." Winter Conference on Applications of Computer Vision, 2021.

Markdown

[Kumaraswamy et al. "Detecting Human-Object Interaction with Mixed Supervision." Winter Conference on Applications of Computer Vision, 2021.](https://mlanthology.org/wacv/2021/kumaraswamy2021wacv-detecting/)

BibTeX

@inproceedings{kumaraswamy2021wacv-detecting,
  title     = {{Detecting Human-Object Interaction with Mixed Supervision}},
  author    = {Kumaraswamy, Suresh Kirthi and Shi, Miaojing and Kijak, Ewa},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2021},
  pages     = {1228-1237},
  url       = {https://mlanthology.org/wacv/2021/kumaraswamy2021wacv-detecting/}
}