Segment Anything Without Supervision

Abstract

The Segmentation Anything Model (SAM) requires labor-intensive data labeling. We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation that does not require human annotations. UnSAM utilizes a divide-and-conquer strategy to “discover” the hierarchical structure of visual scenes. We first leverage top-down clustering methods to partition an unlabeled image into instance/semantic level segments. For all pixels within a segment, a bottom-up clustering method is employed to iteratively merge them into larger groups, thereby forming a hierarchical structure. These unsupervised multi-granular masks are then utilized to supervise model training. Evaluated across seven popular datasets, UnSAM achieves competitive results with the supervised counterpart SAM, and surpasses the previous state-of-the-art in unsupervised segmentation by 11% in terms of AR. Moreover, we show that supervised SAM can also benefit from our self-supervised labels. By integrating our unsupervised pseudo masks into SA-1B’s ground-truth masks and training UnSAM with only 1% of SA-1B, a lightly semi-supervised UnSAM can often segment entities overlooked by supervised SAM, exceeding SAM’s AR by over 6.7% and AP by 3.9% on SA-1B.

Cite

Text

Wang et al. "Segment Anything Without Supervision." Neural Information Processing Systems, 2024. doi:10.52202/079017-4401

Markdown

[Wang et al. "Segment Anything Without Supervision." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wang2024neurips-segment/) doi:10.52202/079017-4401

BibTeX

@inproceedings{wang2024neurips-segment,
  title     = {{Segment Anything Without Supervision}},
  author    = {Wang, XuDong and Yang, Jingfeng and Darrell, Trevor},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4401},
  url       = {https://mlanthology.org/neurips/2024/wang2024neurips-segment/}
}