GINet: Graph Interaction Network for Scene Parsing
Abstract
Recently, context reasoning using image regions beyond local convolution has shown great potential for scene parsing. In this work, we explore how to incorperate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss). The GI unit is capable of enhancing feature representations of convolution networks over high-level semantics and learning the semantic coherency adaptively to each sample. Specifically, the dataset-based linguistic knowledge is first incorporated in the GI unit to promote context reasoning over the visual graph, then the evolved representations of the visual graph are mapped to each local representation to enhance the discriminated capability for scene parsing. GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph. We perform full ablation studies to demonstrate the effectiveness of each component in our approach. Particularly, the proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.
Cite
Text
Wu et al. "GINet: Graph Interaction Network for Scene Parsing." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58520-4_3Markdown
[Wu et al. "GINet: Graph Interaction Network for Scene Parsing." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/wu2020eccv-ginet/) doi:10.1007/978-3-030-58520-4_3BibTeX
@inproceedings{wu2020eccv-ginet,
title = {{GINet: Graph Interaction Network for Scene Parsing}},
author = {Wu, Tianyi and Lu, Yu and Zhu, Yu and Zhang, Chuang and MingWu, and Ma, Zhanyu and Guo, Guodong},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58520-4_3},
url = {https://mlanthology.org/eccv/2020/wu2020eccv-ginet/}
}