Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving

Abstract

Commonly used metrics to evaluate semantic segmentation such as mean intersection over union (mIoU) do not incorporate temporal consistency. A straightforward extension of existing metrics towards evaluating the consistency of segmentation of video sequences does not exist, since labelled videos are rare and very expensive to obtain. For safety-critical applications such as highly automated driving, there is, however, a need for a metric that measures such temporal consistency of video segmentation networks to possibly support safety requirements. In this paper, (a) we introduce a metric which does not require segmentation labels for measuring the stability of the predictions of segmentation networks over a series of images; (b) we perform an in-depth analysis of the proposed metric and observe strong correlations to the supervised mIoU metric; (c) we perform an evaluation of five state-of-the-art networks for semantic segmentation of varying complexities and architectures evaluated on two public datasets, namely, Cityscapes and CamVid. Finally, we perform timing evaluations and propose the use of the metric as either an online observer for identification of possibly unstable segmentation predictions, or as an offline method to evaluate or to improve semantic segmentation networks, e.g., by selecting additional training data with critical temporal consistency.

Cite

Text

Varghese et al. "Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00176

Markdown

[Varghese et al. "Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/varghese2020cvprw-unsupervised/) doi:10.1109/CVPRW50498.2020.00176

BibTeX

@inproceedings{varghese2020cvprw-unsupervised,
  title     = {{Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving}},
  author    = {Varghese, Serin and Bayzidi, Yasin and Bär, Andreas and Kapoor, Nikhil and Lahiri, Sounak and Schneider, Jan David and Schmidt, Nico M. and Schlicht, Peter and Hüger, Fabian and Fingscheidt, Tim},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {1369-1378},
  doi       = {10.1109/CVPRW50498.2020.00176},
  url       = {https://mlanthology.org/cvprw/2020/varghese2020cvprw-unsupervised/}
}