Locate Then Segment: A Strong Pipeline for Referring Image Segmentation

Jing, Ya; Kong, Tao; Wang, Wei; Wang, Liang; Li, Lei; Tan, Tieniu

doi:10.1109/CVPR46437.2021.00973

Locate Then Segment: A Strong Pipeline for Referring Image Segmentation

Ya Jing, Tao Kong, Wei Wang, Liang Wang, Lei Li, Tieniu Tan

CVPR 2021 pp. 9858-9867

doi:10.1109/CVPR46437.2021.00973 /cvpr/2021/jing2021cvpr-locate/

Abstract

Referring image segmentation aims to segment the objects referred by a natural language expression. Previous methods usually focus on designing an implicit and recurrent feature interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask without explicitly modeling the localization of the referent guided by language expression and designing a powerful segmentation module. To tackle these problems, we view this task from another perspective by decoupling it into a "locate-then-segment" (LTS) scheme. Given a language expression, people generally first perform attention to the corresponding target image regions, then generate a segmentation mask about the object based on its context. The LTS first extracts and fuses both visual and textual features to get a cross-modal representation, then applies a cross-model interaction on the visual-textual features to locate the referred object with position prior, and finally generates the segmentation result with a light-weight network. Our LTS is simple but surprisingly effective. On three popular benchmark datasets, the LTS outperforms all the previous state-of-the-arts methods by a large margin (e.g., +3.2% on RefCOCO+ and +3.4% on RefCOCOg). In addition, our model is more interpretable with explicitly locating the object, which is also proved by visualization experiments. Accordingly, this framework is very promising to serve as a pipeline for referring image segmentation.

PDF CVPR Semantic Scholar

Cite

Text

Jing et al. "Locate Then Segment: A Strong Pipeline for Referring Image Segmentation." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00973

Markdown

[Jing et al. "Locate Then Segment: A Strong Pipeline for Referring Image Segmentation." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/jing2021cvpr-locate/) doi:10.1109/CVPR46437.2021.00973

BibTeX

@inproceedings{jing2021cvpr-locate,
  title     = {{Locate Then Segment: A Strong Pipeline for Referring Image Segmentation}},
  author    = {Jing, Ya and Kong, Tao and Wang, Wei and Wang, Liang and Li, Lei and Tan, Tieniu},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {9858-9867},
  doi       = {10.1109/CVPR46437.2021.00973},
  url       = {https://mlanthology.org/cvpr/2021/jing2021cvpr-locate/}
}