Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Abstract

We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in an image, and then learns a detector on these masks using our robust loss function. We further improve performance by self-training the model on its predictions. Compared to prior work, CutLER is simpler, compatible with different detection architectures, and detects multiple objects. CutLER is also a zero-shot unsupervised detector and improves detection performance AP_50 by over 2.7x on 11 benchmarks across domains like video frames, paintings, sketches, etc. With finetuning, CutLER serves as a low-shot detector surpassing MoCo-v2 by 7.3% AP^box and 6.6% AP^mask on COCO when training with 5% labels.

Cite

Text

Wang et al. "Cut and Learn for Unsupervised Object Detection and Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00305

Markdown

[Wang et al. "Cut and Learn for Unsupervised Object Detection and Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/wang2023cvpr-cut/) doi:10.1109/CVPR52729.2023.00305

BibTeX

@inproceedings{wang2023cvpr-cut,
  title     = {{Cut and Learn for Unsupervised Object Detection and Instance Segmentation}},
  author    = {Wang, Xudong and Girdhar, Rohit and Yu, Stella X. and Misra, Ishan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {3124-3134},
  doi       = {10.1109/CVPR52729.2023.00305},
  url       = {https://mlanthology.org/cvpr/2023/wang2023cvpr-cut/}
}