Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding
Abstract
In this paper we present Stixmantics , a novel medium-level scene representation for real-time visual semantic scene understanding. Relevant scene structure, motion and object class information is encoded using so-called Stixels as primitive elements. Sparse feature-point trajectories are used to estimate the 3D motion field and to enforce temporal consistency of semantic labels. Spatial label coherency is obtained by using a CRF framework. The proposed model abstracts and aggregates low-level pixel information to gain robustness and efficiency. Yet, enough flexibility is retained to adequately model complex scenes, such as urban traffic. Our experimental evaluation focuses on semantic scene segmentation using a recently introduced dataset for urban traffic scenes. In comparison to our best baseline approach, we demonstrate state-of-the-art performance but reduce inference time by a factor of more than 2,000, requiring only 50 ms per image.
Cite
Text
Scharwächter et al. "Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10602-1_35Markdown
[Scharwächter et al. "Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/scharwachter2014eccv-stixmantics/) doi:10.1007/978-3-319-10602-1_35BibTeX
@inproceedings{scharwachter2014eccv-stixmantics,
title = {{Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding}},
author = {Scharwächter, Timo and Enzweiler, Markus and Franke, Uwe and Roth, Stefan},
booktitle = {European Conference on Computer Vision},
year = {2014},
pages = {533-548},
doi = {10.1007/978-3-319-10602-1_35},
url = {https://mlanthology.org/eccv/2014/scharwachter2014eccv-stixmantics/}
}