Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation Using Stable Diffusion

Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco

CVPR 2024 pp. 3554-3563

doi:10.1109/CVPR52733.2024.00341 /cvpr/2024/tian2024cvpr-diffuse/

Abstract

Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot transfer segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However constructing a model capable of segmenting anything in a zero-shot manner without any annotations is still challenging. In this paper we propose to utilize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically we introduce a simple yet effective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not require any training or language dependency to extract quality segmentation for any images. On COCO-Stuff-27 our method surpasses the prior unsupervised zero-shot transfer SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU.

PDF CVPR Semantic Scholar

Cite

Text

Tian et al. "Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation Using Stable Diffusion." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00341

Markdown

[Tian et al. "Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation Using Stable Diffusion." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/tian2024cvpr-diffuse/) doi:10.1109/CVPR52733.2024.00341

BibTeX

@inproceedings{tian2024cvpr-diffuse,
  title     = {{Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation Using Stable Diffusion}},
  author    = {Tian, Junjiao and Aggarwal, Lavisha and Colaco, Andrea and Kira, Zsolt and Gonzalez-Franco, Mar},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {3554-3563},
  doi       = {10.1109/CVPR52733.2024.00341},
  url       = {https://mlanthology.org/cvpr/2024/tian2024cvpr-diffuse/}
}