Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Abstract
We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in an image, and then learns a detector on these masks using our robust loss function. We further improve performance by self-training the model on its predictions. Compared to prior work, CutLER is simpler, compatible with different detection architectures, and detects multiple objects. CutLER is also a zero-shot unsupervised detector and improves detection performance AP_50 by over 2.7x on 11 benchmarks across domains like video frames, paintings, sketches, etc. With finetuning, CutLER serves as a low-shot detector surpassing MoCo-v2 by 7.3% AP^box and 6.6% AP^mask on COCO when training with 5% labels.
Cite
Text
Wang et al. "Cut and Learn for Unsupervised Object Detection and Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00305Markdown
[Wang et al. "Cut and Learn for Unsupervised Object Detection and Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/wang2023cvpr-cut/) doi:10.1109/CVPR52729.2023.00305BibTeX
@inproceedings{wang2023cvpr-cut,
title = {{Cut and Learn for Unsupervised Object Detection and Instance Segmentation}},
author = {Wang, Xudong and Girdhar, Rohit and Yu, Stella X. and Misra, Ishan},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {3124-3134},
doi = {10.1109/CVPR52729.2023.00305},
url = {https://mlanthology.org/cvpr/2023/wang2023cvpr-cut/}
}