Zero-Shot Object Detection with Textual Descriptions

Abstract

Object detection is important in real-world applications. Existing methods mainly focus on object detection with sufficient labelled training data or zero-shot object detection with only concept names. In this paper, we address the challenging problem of zero-shot object detection with natural language description, which aims to simultaneously detect and recognize novel concept instances with textual descriptions. We propose a novel deep learning framework to jointly learn visual units, visual-unit attention and word-level attention, which are combined to achieve word-proposal affinity by an element-wise multiplication. To the best of our knowledge, this is the first work on zero-shot object detection with textual descriptions. Since there is no directly related work in the literature, we investigate plausible solutions based on existing zero-shot object detection for a fair comparison. We conduct extensive experiments on three challenging benchmark datasets. The extensive experimental results confirm the superiority of the proposed model.

Cite

Text

Li et al. "Zero-Shot Object Detection with Textual Descriptions." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33018690

Markdown

[Li et al. "Zero-Shot Object Detection with Textual Descriptions." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/li2019aaai-zero-a/) doi:10.1609/AAAI.V33I01.33018690

BibTeX

@inproceedings{li2019aaai-zero-a,
  title     = {{Zero-Shot Object Detection with Textual Descriptions}},
  author    = {Li, Zhihui and Yao, Lina and Zhang, Xiaoqin and Wang, Xianzhi and Kanhere, Salil S. and Zhang, Huaxiang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {8690-8697},
  doi       = {10.1609/AAAI.V33I01.33018690},
  url       = {https://mlanthology.org/aaai/2019/li2019aaai-zero-a/}
}