Seamless Scene Segmentation

Abstract

In this work we introduce a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results. Our goal is to predict consistent semantic segmentation and detection results by means of a panoptic output format, going beyond the simple combination of independently trained segmentation and detection models. The proposed architecture takes advantage of a novel segmentation head that seamlessly integrates multi-scale features generated by a Feature Pyramid Network with contextual information conveyed by a light-weight DeepLab-like module. As additional contribution we review the panoptic metric and propose an alternative that overcomes its limitations when evaluating non-instance categories. Our proposed network architecture yields state-of-the-art results on three challenging street-level datasets, i.e. Cityscapes, Indian Driving Dataset and Mapillary Vistas.

Cite

Text

Porzi et al. "Seamless Scene Segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00847

Markdown

[Porzi et al. "Seamless Scene Segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/porzi2019cvpr-seamless/) doi:10.1109/CVPR.2019.00847

BibTeX

@inproceedings{porzi2019cvpr-seamless,
  title     = {{Seamless Scene Segmentation}},
  author    = {Porzi, Lorenzo and Bulo, Samuel Rota and Colovic, Aleksander and Kontschieder, Peter},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.00847},
  url       = {https://mlanthology.org/cvpr/2019/porzi2019cvpr-seamless/}
}