Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut

Yangtao Wang, Xi Shen, Shell Xu Hu, Yuan Yuan, James L. Crowley, Dominique Vaufreydaz

CVPR 2022 pp. 14543-14553

doi:10.1109/CVPR52688.2022.01414 /cvpr/2022/wang2022cvpr-selfsupervised-a/

Abstract

Transformers trained with self-supervision using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state-of-the-art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUTOMRON respectively compared to state-of-the-art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet. Our code is available at: https://www.m-psi.fr/Papers/TokenCut2022/

PDF CVPR Semantic Scholar

Cite

Text

Wang et al. "Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01414

Markdown

[Wang et al. "Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/wang2022cvpr-selfsupervised-a/) doi:10.1109/CVPR52688.2022.01414

BibTeX

@inproceedings{wang2022cvpr-selfsupervised-a,
  title     = {{Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut}},
  author    = {Wang, Yangtao and Shen, Xi and Hu, Shell Xu and Yuan, Yuan and Crowley, James L. and Vaufreydaz, Dominique},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {14543-14553},
  doi       = {10.1109/CVPR52688.2022.01414},
  url       = {https://mlanthology.org/cvpr/2022/wang2022cvpr-selfsupervised-a/}
}