Visual Recognition by Request

Abstract

Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal. In this paper, we establish a new paradigm named visual recognition by request (ViRReq) to bridge the gap. The key lies in decomposing visual recognition into atomic tasks named requests and leveraging a knowledge base, a hierarchical and text-based dictionary, to assist task definition. ViRReq allows for (i) learning complicated whole-part hierarchies from highly incomplete annotations and (ii) inserting new concepts with minimal efforts. We also establish a solid baseline by integrating language-driven recognition into recent semantic and instance segmentation methods, and demonstrate its flexible recognition ability on CPP and ADE20K, two datasets with hierarchical whole-part annotations.

Cite

Text

Tang et al. "Visual Recognition by Request." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01465

Markdown

[Tang et al. "Visual Recognition by Request." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/tang2023cvpr-visual/) doi:10.1109/CVPR52729.2023.01465

BibTeX

@inproceedings{tang2023cvpr-visual,
  title     = {{Visual Recognition by Request}},
  author    = {Tang, Chufeng and Xie, Lingxi and Zhang, Xiaopeng and Hu, Xiaolin and Tian, Qi},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {15265-15274},
  doi       = {10.1109/CVPR52729.2023.01465},
  url       = {https://mlanthology.org/cvpr/2023/tang2023cvpr-visual/}
}