Zero-Shot Object Detection with Textual Descriptions
Abstract
Object detection is important in real-world applications. Existing methods mainly focus on object detection with sufficient labelled training data or zero-shot object detection with only concept names. In this paper, we address the challenging problem of zero-shot object detection with natural language description, which aims to simultaneously detect and recognize novel concept instances with textual descriptions. We propose a novel deep learning framework to jointly learn visual units, visual-unit attention and word-level attention, which are combined to achieve word-proposal affinity by an element-wise multiplication. To the best of our knowledge, this is the first work on zero-shot object detection with textual descriptions. Since there is no directly related work in the literature, we investigate plausible solutions based on existing zero-shot object detection for a fair comparison. We conduct extensive experiments on three challenging benchmark datasets. The extensive experimental results confirm the superiority of the proposed model.
Cite
Text
Li et al. "Zero-Shot Object Detection with Textual Descriptions." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33018690Markdown
[Li et al. "Zero-Shot Object Detection with Textual Descriptions." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/li2019aaai-zero-a/) doi:10.1609/AAAI.V33I01.33018690BibTeX
@inproceedings{li2019aaai-zero-a,
title = {{Zero-Shot Object Detection with Textual Descriptions}},
author = {Li, Zhihui and Yao, Lina and Zhang, Xiaoqin and Wang, Xianzhi and Kanhere, Salil S. and Zhang, Huaxiang},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2019},
pages = {8690-8697},
doi = {10.1609/AAAI.V33I01.33018690},
url = {https://mlanthology.org/aaai/2019/li2019aaai-zero-a/}
}