VRP-SAM: SAM with Visual Reference Prompt

Sun, Yanpeng; Chen, Jiahui; Zhang, Shan; Zhang, Xinyu; Chen, Qiang; Zhang, Gang; Ding, Errui; Wang, Jingdong; Li, Zechao

doi:10.1109/CVPR52733.2024.02224

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li

CVPR 2024 pp. 23565-23574

doi:10.1109/CVPR52733.2024.02224 /cvpr/2024/sun2024cvpr-vrpsam/

Abstract

In this paper we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation creating the VRP-SAM model. In essence VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation formats for reference images including point box scribble and mask. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicability while preserving SAM's inherent strengths thus enhancing user-friendliness. To enhance the generalization ability of VRP-SAM the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM we conducted extensive empirical studies on the Pascal and COCO datasets. Remarkably VRP-SAM achieved state-of-the-art performance in visual reference segmentation with minimal learnable parameters. Furthermore VRP-SAM demonstrates strong generalization capabilities allowing it to perform segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at https://github.com/syp2ysy/VRP-SAM

PDF CVPR Semantic Scholar

Cite

Text

Sun et al. "VRP-SAM: SAM with Visual Reference Prompt." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02224

Markdown

[Sun et al. "VRP-SAM: SAM with Visual Reference Prompt." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/sun2024cvpr-vrpsam/) doi:10.1109/CVPR52733.2024.02224

BibTeX

@inproceedings{sun2024cvpr-vrpsam,
  title     = {{VRP-SAM: SAM with Visual Reference Prompt}},
  author    = {Sun, Yanpeng and Chen, Jiahui and Zhang, Shan and Zhang, Xinyu and Chen, Qiang and Zhang, Gang and Ding, Errui and Wang, Jingdong and Li, Zechao},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {23565-23574},
  doi       = {10.1109/CVPR52733.2024.02224},
  url       = {https://mlanthology.org/cvpr/2024/sun2024cvpr-vrpsam/}
}