Hybrid Active Learning via Deep Clustering for Video Action Detection
Abstract
In this work, we focus on reducing the annotation cost for video action detection which requires costly frame-wise dense annotations. We study a novel hybrid active learning (AL) strategy which performs efficient labeling using both intra-sample and inter-sample selection. The intra-sample selection leads to labeling of fewer frames in a video as opposed to inter-sample selection which operates at video level. This hybrid strategy reduces the annotation cost from two different aspects leading to significant labeling cost reduction. The proposed approach utilize Clustering-Aware Uncertainty Scoring (CLAUS), a novel label acquisition strategy which relies on both informativeness and diversity for sample selection. We also propose a novel Spatio-Temporal Weighted (STeW) loss formulation, which helps in model training under limited annotations. The proposed approach is evaluated on UCF-101-24 and J-HMDB-21 datasets demonstrating its effectiveness in significantly reducing the annotation cost where it consistently outperforms other baselines. Project details available at https://sites.google.com/view/activesparselabeling/home
Cite
Text
Rana and Rawat. "Hybrid Active Learning via Deep Clustering for Video Action Detection." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01809Markdown
[Rana and Rawat. "Hybrid Active Learning via Deep Clustering for Video Action Detection." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/rana2023cvpr-hybrid/) doi:10.1109/CVPR52729.2023.01809BibTeX
@inproceedings{rana2023cvpr-hybrid,
title = {{Hybrid Active Learning via Deep Clustering for Video Action Detection}},
author = {Rana, Aayush J. and Rawat, Yogesh S.},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {18867-18877},
doi = {10.1109/CVPR52729.2023.01809},
url = {https://mlanthology.org/cvpr/2023/rana2023cvpr-hybrid/}
}