Weakly Supervised Open-Vocabulary Object Detection

Lin, Jianghang; Shen, Yunhang; Wang, Bingquan; Lin, Shaohui; Li, Ke; Cao, Liujuan

doi:10.1609/AAAI.V38I4.28127

Weakly Supervised Open-Vocabulary Object Detection

Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao

AAAI 2024 pp. 3404-3412

doi:10.1609/AAAI.V38I4.28127 /aaai/2024/lin2024aaai-weakly/

Abstract

Despite weakly supervised object detection (WSOD) being a promising step toward evading strong instance-level annotations, its capability is confined to closed-set categories within a single training dataset. In this paper, we propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD to detect novel concepts and utilize diverse datasets with only image-level annotations. To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment. First, we perform data-aware feature extraction to produce an input-conditional coefficient, which is leveraged into dataset attribute prototypes to identify dataset bias and help achieve cross-dataset generalization. Second, a customized location-oriented weakly supervised region proposal network is proposed to utilize high-level semantic layouts from the category-agnostic segment anything model to distinguish object boundaries. Lastly, we introduce a proposal-concept synchronized multiple-instance network, i.e., object mining and refinement with visual-semantic alignment, to discover objects matched to the text embeddings of concepts. Extensive experiments on Pascal VOC and MS COCO demonstrate that the proposed WSOVOD achieves new state-of-the-art compared with previous WSOD methods in both close-set object localization and detection tasks. Meanwhile, WSOVOD enables cross-dataset and open-vocabulary learning to achieve on-par or even better performance than well-established fully-supervised open-vocabulary object detection (FSOVOD).

PDF AAAI Semantic Scholar

Cite

Text

Lin et al. "Weakly Supervised Open-Vocabulary Object Detection." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I4.28127

Markdown

[Lin et al. "Weakly Supervised Open-Vocabulary Object Detection." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/lin2024aaai-weakly/) doi:10.1609/AAAI.V38I4.28127

BibTeX

@inproceedings{lin2024aaai-weakly,
  title     = {{Weakly Supervised Open-Vocabulary Object Detection}},
  author    = {Lin, Jianghang and Shen, Yunhang and Wang, Bingquan and Lin, Shaohui and Li, Ke and Cao, Liujuan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {3404-3412},
  doi       = {10.1609/AAAI.V38I4.28127},
  url       = {https://mlanthology.org/aaai/2024/lin2024aaai-weakly/}
}