NamedMask: Distilling Segmenters from Complementary Foundation Models

Shin, Gyungin; Xie, Weidi; Albanie, Samuel

doi:10.1109/CVPRW59228.2023.00524

NamedMask: Distilling Segmenters from Complementary Foundation Models

Gyungin Shin, Weidi Xie, Samuel Albanie

CVPRW 2023 pp. 4961-4970

doi:10.1109/CVPRW59228.2023.00524 /cvprw/2023/shin2023cvprw-namedmask/

Abstract

The goal of this work is to segment and name regions of images without access to pixel-level labels during training. To tackle this task, we construct segmenters by distilling the complementary strengths of two foundation models. The first, CLIP [26], exhibits the ability to assign names to image content but lacks an accessible representation of object structure. The second, DINO [5], captures the spatial extent of objects but has no knowledge of object names. Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images. These images are pseudo-labelled with a category-agnostic salient object detector bootstrapped from DINO, then refined by category-specific segmenters using the CLIP archive labels. Thanks to the high quality of the refined masks, we show that a standard segmentation architecture trained on these archives with appropriate data augmentation achieves impressive semantic segmentation abilities for both single-object and multi-object images. As a result, our proposed NamedMask performs favourably against a range of prior work on five benchmarks including the VOC2012, COCO and large-scale ImageNet-S datasets.

PDF CVPRW Semantic Scholar

Cite

Text

Shin et al. "NamedMask: Distilling Segmenters from Complementary Foundation Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00524

Markdown

[Shin et al. "NamedMask: Distilling Segmenters from Complementary Foundation Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/shin2023cvprw-namedmask/) doi:10.1109/CVPRW59228.2023.00524

BibTeX

@inproceedings{shin2023cvprw-namedmask,
  title     = {{NamedMask: Distilling Segmenters from Complementary Foundation Models}},
  author    = {Shin, Gyungin and Xie, Weidi and Albanie, Samuel},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {4961-4970},
  doi       = {10.1109/CVPRW59228.2023.00524},
  url       = {https://mlanthology.org/cvprw/2023/shin2023cvprw-namedmask/}
}