PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

CVPR 2023 pp. 18653-18663

doi:10.1109/CVPR52729.2023.01789 /cvpr/2023/liu2023cvpr-polyformer/

Abstract

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input, and outputs a sequence of polygon vertices autoregressively. For more accurate geometric localization, we propose a regression-based decoder, which predicts the precise floating-point coordinates directly, without any coordinate quantization error. In the experiments, PolyFormer outperforms the prior art by a clear margin, e.g., 5.40% and 4.52% absolute improvements on the challenging RefCOCO+ and RefCOCOg datasets. It also shows strong generalization ability when evaluated on the referring video segmentation task without fine-tuning, e.g., achieving competitive 61.5% J&F on the Ref-DAVIS17 dataset.

PDF CVPR Semantic Scholar

Cite

Text

Liu et al. "PolyFormer: Referring Image Segmentation as Sequential Polygon Generation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01789

Markdown

[Liu et al. "PolyFormer: Referring Image Segmentation as Sequential Polygon Generation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/liu2023cvpr-polyformer/) doi:10.1109/CVPR52729.2023.01789

BibTeX

@inproceedings{liu2023cvpr-polyformer,
  title     = {{PolyFormer: Referring Image Segmentation as Sequential Polygon Generation}},
  author    = {Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {18653-18663},
  doi       = {10.1109/CVPR52729.2023.01789},
  url       = {https://mlanthology.org/cvpr/2023/liu2023cvpr-polyformer/}
}