ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
Abstract
Object proposal generation is an important and fundamental task in computer vision. In this paper, we propose ProposalCLIP, a method towards unsupervised open-category object proposal generation. Unlike previous works which require a large number of bounding box annotations and/or can only generate proposals for limited object categories, our ProposalCLIP is able to predict proposals for a large variety of object categories without annotations, by exploiting CLIP (contrastive language-image pre-training) cues. Firstly, we analyze CLIP for unsupervised open-category proposal generation and design an objectness score based on our empirical analysis on proposal selection. Secondly, a graph-based merging module is proposed to solve the limitations of CLIP cues and merge fragmented proposals. Finally, we present a proposal regression module that extracts pseudo labels based on CLIP cues and trains a lightweight network to further refine proposals. Extensive experiments on PASCAL VOC, COCO and Visual Genome datasets show that our ProposalCLIP can better generate proposals than previous state-of-the-art methods. Our ProposalCLIP also shows benefits for downstream tasks, such as unsupervised object detection.
Cite
Text
Shi et al. "ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00939Markdown
[Shi et al. "ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/shi2022cvpr-proposalclip/) doi:10.1109/CVPR52688.2022.00939BibTeX
@inproceedings{shi2022cvpr-proposalclip,
title = {{ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues}},
author = {Shi, Hengcan and Hayat, Munawar and Wu, Yicheng and Cai, Jianfei},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {9611-9620},
doi = {10.1109/CVPR52688.2022.00939},
url = {https://mlanthology.org/cvpr/2022/shi2022cvpr-proposalclip/}
}