Few-Shot Video Object Detection
Abstract
We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to visual learning in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity. Our TPN and TMN+ are jointly and end-to-end trained. Extensive experiments demonstrate that our method produces significantly better detection results on two few-shot video object detection datasets compared to image-based methods and other naive video-based extensions. Codes and datasets are released at https://github.com/fanq15/FewX.
Cite
Text
Fan et al. "Few-Shot Video Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20044-1_5Markdown
[Fan et al. "Few-Shot Video Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/fan2022eccv-fewshot-a/) doi:10.1007/978-3-031-20044-1_5BibTeX
@inproceedings{fan2022eccv-fewshot-a,
title = {{Few-Shot Video Object Detection}},
author = {Fan, Qi and Tang, Chi-Keung and Tai, Yu-Wing},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-20044-1_5},
url = {https://mlanthology.org/eccv/2022/fan2022eccv-fewshot-a/}
}