SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Mahmud, Tanvir; Liu, Chun-Hao; Yaman, Burhaneddin; Marculescu, Diana

SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Tanvir Mahmud, Chun-Hao Liu, Burhaneddin Yaman, Diana Marculescu

WACV 2024 pp. 6773-6782

/wacv/2024/mahmud2024wacv-ssvod/

Abstract

Despite significant progress in semi-supervised learning for image object detection, several key issues are yet to be addressed for video object detection: (1) Achieving good performance for supervised video object detection greatly depends on the availability of annotated frames. (2) Despite having large inter-frame correlations in a video, collecting annotations for a large number of frames per video is expensive, time-consuming, and often redundant. (3) Existing semi-supervised techniques on static images can hardly exploit the temporal motion dynamics inherently present in videos. In this paper, we introduce SSVOD, an end-to-end semi-supervised video object detection framework that exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations. To selectively assemble robust pseudo-labels across groups of frames, we introduce flow-warped predictions from nearby frames for temporal-consistency estimation. In particular, we introduce cross-IoU and cross-divergence based selection methods over a set of estimated predictions to include robust pseudo-labels for bounding boxes and class labels, respectively. To strike a balance between confirmation bias and uncertainty noise in pseudo-labels, we propose confidence threshold based combination of hard and soft pseudo-labels. Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS datasets. Codes are available at https://github.com/enyac-group/SSVOD.git.

PDF WACV Semantic Scholar

Cite

Text

Mahmud et al. "SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Mahmud et al. "SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/mahmud2024wacv-ssvod/)

BibTeX

@inproceedings{mahmud2024wacv-ssvod,
  title     = {{SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations}},
  author    = {Mahmud, Tanvir and Liu, Chun-Hao and Yaman, Burhaneddin and Marculescu, Diana},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {6773-6782},
  url       = {https://mlanthology.org/wacv/2024/mahmud2024wacv-ssvod/}
}