Unified Perceptual Parsing for Scene Understanding

Abstract

Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes.

Cite

Text

Xiao et al. "Unified Perceptual Parsing for Scene Understanding." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01228-1_26

Markdown

[Xiao et al. "Unified Perceptual Parsing for Scene Understanding." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/xiao2018eccv-unified/) doi:10.1007/978-3-030-01228-1_26

BibTeX

@inproceedings{xiao2018eccv-unified,
  title     = {{Unified Perceptual Parsing for Scene Understanding}},
  author    = {Xiao, Tete and Liu, Yingcheng and Zhou, Bolei and Jiang, Yuning and Sun, Jian},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  doi       = {10.1007/978-3-030-01228-1_26},
  url       = {https://mlanthology.org/eccv/2018/xiao2018eccv-unified/}
}