Segment and Recognize Anything at Any Granularity

Abstract

In this work, we introduce , an augmented image segmentation foundation for segmenting and recognizing anything at desired granularities. Compared to the foundational segmentation model SAM [?], our model has two unique advantages: (i) granularity-controllability in that the model can produce segmentation masks at any desired granularities, from objects to parts to both; (ii) semantic-awareness in that the model simultaneously predicts semantic labels for masks at different granularities. To enable multi-granularity capabilities, we propose a multi-choice learning scheme, where each click point generates a set of masks at multiple levels of granularity, corresponding to a set of ground-truth masks. To achieve semantic awareness, we consolidate multiple datasets of different levels of granularity and train our model using decoupled object- and part-based tasks to facilitate knowledge sharing and transfer among different tasks. To the best of our knowledge, this work is the first attempt to jointly train a model on SA-1B, instance-level, and part-level segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves the desired goals. Furthermore, we show that multi-task training using the segmentation task defined on SA-1B and other segmentation tasks (e.g., panoptic and part segmentation) leads to performance gains on all segmentation tasks. In particular, we achieve a new state-of-the-art in COCO panoptic segmentation 60.2 PQ by adding SAM data.

Cite

Text

Li et al. "Segment and Recognize Anything at Any Granularity." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73195-2_27

Markdown

[Li et al. "Segment and Recognize Anything at Any Granularity." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/li2024eccv-segment/) doi:10.1007/978-3-031-73195-2_27

BibTeX

@inproceedings{li2024eccv-segment,
  title     = {{Segment and Recognize Anything at Any Granularity}},
  author    = {Li, Feng and Zhang, Hao and Sun, Peize and Zou, Xueyan and Liu, Shilong and Li, Chunyuan and Yang, Jianwei and Zhang, Lei and Gao, Jianfeng},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73195-2_27},
  url       = {https://mlanthology.org/eccv/2024/li2024eccv-segment/}
}