Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction
Abstract
One major challenge for semantic segmentation in real-world scenarios is only limited pixel-level labels available due to high expense of human labor though a vast volume of video data is provided. Existing semi-supervised methods attempt to exploit unlabeled data in model training, but they just regard video as a set of independent images. To better explore semi-supervised segmentation problem with video data, we formulate a semi-supervised video semantic segmentation task in this paper. For this task, we observe that the overfitting is surprisingly severe between labeled and unlabeled frames within a training video although they are very similar in style and contents. This is called inner-video overfitting, and it would actually lead to inferior performance. To tackle this issue, we propose a novel inter-frame feature reconstruction (IFR) technique to leverage the ground-truth labels to supervise the model training on unlabeled frames. IFR is essentially to utilize the internal relevance of different frames within a video. During training, IFR would enforce the feature distributions between labeled and unlabeled frames to be narrowed. Consequently, the inner-video overfitting issue can be effectively alleviated. We conduct extensive experiments on Cityscapes and CamVid, and the results demonstrate the superiority of our proposed method to previous state-of-the-art methods. The code is available at https://github.com/jfzhuang/IFR.
Cite
Text
Zhuang et al. "Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00326Markdown
[Zhuang et al. "Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhuang2022cvpr-semisupervised/) doi:10.1109/CVPR52688.2022.00326BibTeX
@inproceedings{zhuang2022cvpr-semisupervised,
title = {{Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction}},
author = {Zhuang, Jiafan and Wang, Zilei and Gao, Yuan},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {3263-3271},
doi = {10.1109/CVPR52688.2022.00326},
url = {https://mlanthology.org/cvpr/2022/zhuang2022cvpr-semisupervised/}
}