Zero-Shot Referring Image Segmentation with Global-Local Context Features
Abstract
Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed instance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.
Cite
Text
Yu et al. "Zero-Shot Referring Image Segmentation with Global-Local Context Features." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01864Markdown
[Yu et al. "Zero-Shot Referring Image Segmentation with Global-Local Context Features." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/yu2023cvpr-zeroshot/) doi:10.1109/CVPR52729.2023.01864BibTeX
@inproceedings{yu2023cvpr-zeroshot,
title = {{Zero-Shot Referring Image Segmentation with Global-Local Context Features}},
author = {Yu, Seonghoon and Seo, Paul Hongsuck and Son, Jeany},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {19456-19465},
doi = {10.1109/CVPR52729.2023.01864},
url = {https://mlanthology.org/cvpr/2023/yu2023cvpr-zeroshot/}
}