Advancing Referring Expression Segmentation Beyond Single Image

Yixuan Wu, Zhao Zhang, Chi Xie, Feng Zhu, Rui Zhao

ICCV 2023 pp. 2628-2638

doi:10.1109/ICCV51070.2023.00248 /iccv/2023/wu2023iccv-advancing/

Abstract

Referring Expression Segmentation (RES) is a widely explored multi-modal task, which endeavors to segment the pre-existing object within a single image with a given linguistic expression. However, in broader real-world scenarios, it is not always possible to determine if the described object exists in a specific image. Generally, a collection of images is available, some of which potentially contain the target objects. To this end, we propose a more realistic setting, named Group-wise Referring Expression Segmentation (GRES), which expands RES to a group of related images, allowing the described objects to exist in a subset of the input image group. To support this new setting, we introduce an elaborately compiled dataset named Grouped Referring Dataset (GRD), containing complete group-wise annotations of the target objects described by given expressions. Moreover, we also present a baseline method named Grouped Referring Segmenter (GRSer), which explicitly captures the language-vision and intra-group vision-vision interactions to achieve state-of-the-art results on the proposed GRES setting and related tasks, such as Co-Salient Object Detection and traditional RES. Our dataset and codes are publicly released in https://github.com/shikras/d-cube.

PDF ICCV Semantic Scholar

Cite

Text

Wu et al. "Advancing Referring Expression Segmentation Beyond Single Image." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00248

Markdown

[Wu et al. "Advancing Referring Expression Segmentation Beyond Single Image." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/wu2023iccv-advancing/) doi:10.1109/ICCV51070.2023.00248

BibTeX

@inproceedings{wu2023iccv-advancing,
  title     = {{Advancing Referring Expression Segmentation Beyond Single Image}},
  author    = {Wu, Yixuan and Zhang, Zhao and Xie, Chi and Zhu, Feng and Zhao, Rui},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {2628-2638},
  doi       = {10.1109/ICCV51070.2023.00248},
  url       = {https://mlanthology.org/iccv/2023/wu2023iccv-advancing/}
}