Instances as Queries

Abstract

We present QueryInst, a new perspective for instance segmentation. QueryInst is a multi-stage end-to-end system that treats instances of interest as learnable queries, enabling query based object detectors, e.g., Sparse R-CNN, to have strong instance segmentation performance. The attributes of instances such as categories, bounding boxes, instance masks, and instance association embeddings are represented by queries in a unified manner. In QueryInst, a query is shared by both detection and segmentation via dynamic convolutions and driven by parallelly-supervised multi-stage learning. We conduct extensive experiments on three challenging benchmarks, i.e., COCO, CityScapes, and YouTube-VIS to evaluate the effectiveness of QueryInst in object detection, instance segmentation, and video instance segmentation tasks. For the first time, we demonstrate that a simple end-to-end query based framework can achieve the state-of-the-art performance in various instance-level recognition tasks. Code is available at https://github.com/hustvl/QueryInst.

Cite

Text

Fang et al. "Instances as Queries." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00683

Markdown

[Fang et al. "Instances as Queries." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/fang2021iccv-instances/) doi:10.1109/ICCV48922.2021.00683

BibTeX

@inproceedings{fang2021iccv-instances,
  title     = {{Instances as Queries}},
  author    = {Fang, Yuxin and Yang, Shusheng and Wang, Xinggang and Li, Yu and Fang, Chen and Shan, Ying and Feng, Bin and Liu, Wenyu},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {6910-6919},
  doi       = {10.1109/ICCV48922.2021.00683},
  url       = {https://mlanthology.org/iccv/2021/fang2021iccv-instances/}
}