Scene Parsing Through ADE20K Dataset
Abstract
Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community's efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A scene parsing benchmark is built upon the ADE20K with 150 object and stuff classes included. Several segmentation baseline models are evaluated on the benchmark. A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines. We further show that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis(Dataset and pretrained models are available at http://groups.csail.mit.edu/vision/datasets/ADE20K/).
Cite
Text
Zhou et al. "Scene Parsing Through ADE20K Dataset." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.544Markdown
[Zhou et al. "Scene Parsing Through ADE20K Dataset." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/zhou2017cvpr-scene/) doi:10.1109/CVPR.2017.544BibTeX
@inproceedings{zhou2017cvpr-scene,
title = {{Scene Parsing Through ADE20K Dataset}},
author = {Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2017},
doi = {10.1109/CVPR.2017.544},
url = {https://mlanthology.org/cvpr/2017/zhou2017cvpr-scene/}
}