Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection
Abstract
We study the challenging incremental few-shot object detection (iFSD) setting. Recently, hypernetwork-based approaches have been studied in the context of continuous and finetune-free iFSD with limited success. We take a closer look at important design choices of such methods, leading to several key improvements and resulting in a more accurate and flexible framework, which we call Sylph. In particular, we demonstrate the effectiveness of decoupling object classification from localization by leveraging a base detector that is pretrained for class-agnostic localization on large-scale dataset. Contrary to what previous results have suggested, we show that with a carefully designed class-conditional hypernetwork, finetune-free iFSD can be highly effective, especially when a large number of base categories with abundant data are available for meta-training, almost approaching alternatives that undergo test-time-training. This result is even more significant considering its many practical advantages: (1) incrementally learning new classes in sequence without additional training, (2) detecting both novel and seen classes in a single pass, and (3) no forgetting of previously seen classes. We benchmark our model on both COCO and LVIS, reporting as high as 17% AP on the long-tail rare classes on LVIS, indicating the promise of hypernetwork-based iFSD.
Cite
Text
Yin et al. "Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00883Markdown
[Yin et al. "Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/yin2022cvpr-sylph/) doi:10.1109/CVPR52688.2022.00883BibTeX
@inproceedings{yin2022cvpr-sylph,
title = {{Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection}},
author = {Yin, Li and Perez-Rua, Juan M. and Liang, Kevin J.},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {9035-9045},
doi = {10.1109/CVPR52688.2022.00883},
url = {https://mlanthology.org/cvpr/2022/yin2022cvpr-sylph/}
}