Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Abstract

Few-shot segmentation remains challenging due to the limitations of its labeling information for unseen classes. Most previous approaches rely on extracting high-level feature maps from the frozen visual encoder to compute the pixel-wise similarity as a key prior guidance for the decoder. However such a prior representation suffers from coarse granularity and poor generalization to new classes since these high-level feature maps have obvious category bias. In this work we propose to replace the visual prior representation with the visual-text alignment capacity to capture more reliable guidance and enhance the model generalization. Specifically we design two kinds of training-free prior information generation strategy that attempts to utilize the semantic alignment capability of the Contrastive Language-Image Pre-training model (CLIP) to locate the target class. Besides to acquire more accurate prior guidance we build a high-order relationship of attention maps and utilize it to refine the initial prior information. Experiments on both the PASCAL-5i and COCO-20i datasets show that our method obtains a clearly substantial improvement and reaches the new state-of-the-art performance. The code is available on the project website.

Cite

Text

Wang et al. "Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00378

Markdown

[Wang et al. "Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/wang2024cvpr-rethinking/) doi:10.1109/CVPR52733.2024.00378

BibTeX

@inproceedings{wang2024cvpr-rethinking,
  title     = {{Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation}},
  author    = {Wang, Jin and Zhang, Bingfeng and Pang, Jian and Chen, Honglong and Liu, Weifeng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {3941-3951},
  doi       = {10.1109/CVPR52733.2024.00378},
  url       = {https://mlanthology.org/cvpr/2024/wang2024cvpr-rethinking/}
}