Human-in-the-Loop Video Semantic Segmentation Auto-Annotation

Abstract

Accurate per-pixel semantic class annotations of the entire video are crucial for designing and evaluating video semantic segmentation algorithms. However, the annotations are usually limited to a small subset of the video frames due to the high annotation cost and limited budget in practice. In this paper, we propose a novel human-in-the-loop framework called HVSA to generate semantic segmentation annotations for the entire video using only a small annotation budget. Our method alternates between active sample selection and test-time fine-tuning algorithms until annotation quality is satisfied. In particular, the active sample selection algorithm picks the most important samples to get manual annotations, where the sample can be a video frame, a rectangle, or even a super-pixel. Further, the test-time fine-tuning algorithm propagates the manual annotations of selected samples to the entire video. Real-world experiments show that our method generates highly accurate and consistent semantic segmentation annotations while simultaneously enjoys significantly small annotation cost.

Cite

Text

Qiao et al. "Human-in-the-Loop Video Semantic Segmentation Auto-Annotation." Winter Conference on Applications of Computer Vision, 2023.

Markdown

[Qiao et al. "Human-in-the-Loop Video Semantic Segmentation Auto-Annotation." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/qiao2023wacv-humanintheloop/)

BibTeX

@inproceedings{qiao2023wacv-humanintheloop,
  title     = {{Human-in-the-Loop Video Semantic Segmentation Auto-Annotation}},
  author    = {Qiao, Nan and Sun, Yuyin and Liu, Chong and Xia, Lu and Luo, Jiajia and Zhang, Ke and Kuo, Cheng-Hao},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2023},
  pages     = {5881-5891},
  url       = {https://mlanthology.org/wacv/2023/qiao2023wacv-humanintheloop/}
}