SNIDA: Unlocking Few-Shot Object Detection with Non-Linear Semantic Decoupling Augmentation

Abstract

Once only a few-shot annotated samples are available the performance of learning-based object detection would be heavily dropped. Many few-shot object detection (FSOD) methods have been proposed to tackle this issue by adopting image-level augmentations in linear manners. Nevertheless those handcrafted enhancements often suffer from limited diversity and lack of semantic awareness resulting in unsatisfactory performance. To this end we propose a Semantic-guided Non-linear Instance-level Data Augmentation method (SNIDA) for FSOD by decoupling the foreground and background to increase their diversities respectively. We design a semantic awareness enhancement strategy to separate objects from backgrounds. Concretely masks of instances are extracted by an unsupervised semantic segmentation module. Then the diversity of samples would be improved by fusing instances into different backgrounds. Considering the shortcomings of augmenting images in a limited transformation space of existing traditional data augmentation methods we introduce an object reconstruction enhancement module. The aim of this module is to generate sufficient diversity and non-linear training data at the instance level through a semantic-guided masked autoencoder. In this way the potential of data can be fully exploited in various object detection scenarios. Extensive experiments on PASCAL VOC and MS-COCO demonstrate that the proposed method outperforms baselines by a large margin and achieves new state-of-the-art results under different shot settings.

Cite

Text

Wang et al. "SNIDA: Unlocking Few-Shot Object Detection with Non-Linear Semantic Decoupling Augmentation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01192

Markdown

[Wang et al. "SNIDA: Unlocking Few-Shot Object Detection with Non-Linear Semantic Decoupling Augmentation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/wang2024cvpr-snida/) doi:10.1109/CVPR52733.2024.01192

BibTeX

@inproceedings{wang2024cvpr-snida,
  title     = {{SNIDA: Unlocking Few-Shot Object Detection with Non-Linear Semantic Decoupling Augmentation}},
  author    = {Wang, Yanjie and Zou, Xu and Yan, Luxin and Zhong, Sheng and Zhou, Jiahuan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {12544-12553},
  doi       = {10.1109/CVPR52733.2024.01192},
  url       = {https://mlanthology.org/cvpr/2024/wang2024cvpr-snida/}
}