Language-Driven Semantic Segmentation

Boyi Li, Kilian Q Weinberger, Serge Belongie, Vladlen Koltun, Rene Ranftl

ICLR 2022

/iclr/2022/li2022iclr-languagedriven/

Abstract

We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., ``grass'' or ``building'') together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., ``cat'' and ``furry''). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided. Code and demo are available at https://github.com/isl-org/lang-seg.

PDF ICLR Semantic Scholar

Cite

Text

Li et al. "Language-Driven Semantic Segmentation." International Conference on Learning Representations, 2022.

Markdown

[Li et al. "Language-Driven Semantic Segmentation." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/li2022iclr-languagedriven/)

BibTeX

@inproceedings{li2022iclr-languagedriven,
  title     = {{Language-Driven Semantic Segmentation}},
  author    = {Li, Boyi and Weinberger, Kilian Q and Belongie, Serge and Koltun, Vladlen and Ranftl, Rene},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/li2022iclr-languagedriven/}
}