ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

Abstract

Object proposal generation is an important and fundamental task in computer vision. In this paper, we propose ProposalCLIP, a method towards unsupervised open-category object proposal generation. Unlike previous works which require a large number of bounding box annotations and/or can only generate proposals for limited object categories, our ProposalCLIP is able to predict proposals for a large variety of object categories without annotations, by exploiting CLIP (contrastive language-image pre-training) cues. Firstly, we analyze CLIP for unsupervised open-category proposal generation and design an objectness score based on our empirical analysis on proposal selection. Secondly, a graph-based merging module is proposed to solve the limitations of CLIP cues and merge fragmented proposals. Finally, we present a proposal regression module that extracts pseudo labels based on CLIP cues and trains a lightweight network to further refine proposals. Extensive experiments on PASCAL VOC, COCO and Visual Genome datasets show that our ProposalCLIP can better generate proposals than previous state-of-the-art methods. Our ProposalCLIP also shows benefits for downstream tasks, such as unsupervised object detection.

Cite

Text

Shi et al. "ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00939

Markdown

[Shi et al. "ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/shi2022cvpr-proposalclip/) doi:10.1109/CVPR52688.2022.00939

BibTeX

@inproceedings{shi2022cvpr-proposalclip,
  title     = {{ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues}},
  author    = {Shi, Hengcan and Hayat, Munawar and Wu, Yicheng and Cai, Jianfei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {9611-9620},
  doi       = {10.1109/CVPR52688.2022.00939},
  url       = {https://mlanthology.org/cvpr/2022/shi2022cvpr-proposalclip/}
}