End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
Abstract
Most existing Human-Object Interaction (HOI) Detection methods rely heavily on full annotations with predefined HOI categories, which is limited in diversity and costly to scale further. We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously. The fundamental challenges are to discover potential human-object pairs and identify novel HOI categories. To overcome the above challenges, we propose a novel End-to-end zero-shot HOI Detection (EoID) framework via vision-language knowledge distillation. We first design an Interactive Score module combined with a Two-stage Bipartite Matching algorithm to achieve interaction distinguishment for human-object pairs in an action-agnostic manner. Then we transfer the distribution of action probability from the pretrained vision-language teacher as well as the seen ground truth to the HOI model to attain zero-shot HOI classification. Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs. Finally, our method outperforms the previous SOTA under various zero-shot settings. Moreover, our method is generalizable to large-scale object detection data to further scale up the action sets. The source code is available at: https://github.com/mrwu-mac/EoID.
Cite
Text
Wu et al. "End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I3.25385Markdown
[Wu et al. "End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/wu2023aaai-end/) doi:10.1609/AAAI.V37I3.25385BibTeX
@inproceedings{wu2023aaai-end,
title = {{End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation}},
author = {Wu, Mingrui and Gu, Jiaxin and Shen, Yunhang and Lin, Mingbao and Chen, Chao and Sun, Xiaoshuai},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2023},
pages = {2839-2846},
doi = {10.1609/AAAI.V37I3.25385},
url = {https://mlanthology.org/aaai/2023/wu2023aaai-end/}
}