Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction

Abstract

One major challenge for semantic segmentation in real-world scenarios is only limited pixel-level labels available due to high expense of human labor though a vast volume of video data is provided. Existing semi-supervised methods attempt to exploit unlabeled data in model training, but they just regard video as a set of independent images. To better explore semi-supervised segmentation problem with video data, we formulate a semi-supervised video semantic segmentation task in this paper. For this task, we observe that the overfitting is surprisingly severe between labeled and unlabeled frames within a training video although they are very similar in style and contents. This is called inner-video overfitting, and it would actually lead to inferior performance. To tackle this issue, we propose a novel inter-frame feature reconstruction (IFR) technique to leverage the ground-truth labels to supervise the model training on unlabeled frames. IFR is essentially to utilize the internal relevance of different frames within a video. During training, IFR would enforce the feature distributions between labeled and unlabeled frames to be narrowed. Consequently, the inner-video overfitting issue can be effectively alleviated. We conduct extensive experiments on Cityscapes and CamVid, and the results demonstrate the superiority of our proposed method to previous state-of-the-art methods. The code is available at https://github.com/jfzhuang/IFR.

Cite

Text

Zhuang et al. "Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00326

Markdown

[Zhuang et al. "Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhuang2022cvpr-semisupervised/) doi:10.1109/CVPR52688.2022.00326

BibTeX

@inproceedings{zhuang2022cvpr-semisupervised,
  title     = {{Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction}},
  author    = {Zhuang, Jiafan and Wang, Zilei and Gao, Yuan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {3263-3271},
  doi       = {10.1109/CVPR52688.2022.00326},
  url       = {https://mlanthology.org/cvpr/2022/zhuang2022cvpr-semisupervised/}
}