Semantic-Aligned Fusion Transformer for One-Shot Object Detection

Abstract

One-shot object detection aims at detecting novel objects according to merely one given instance. With extreme data scarcity, current approaches explore various feature fusions to obtain directly transferable meta-knowledge. Yet, their performances are often unsatisfactory. In this paper, we attribute this to inappropriate correlation methods that misalign query-support semantics by overlooking spatial structures and scale variances. Upon analysis, we leverage the attention mechanism and propose a simple but effective architecture named Semantic-aligned Fusion Transformer (SaFT) to resolve these issues. Specifically, we equip SaFT with a vertical fusion module (VFM) for cross-scale semantic enhancement and a horizontal fusion module (HFM) for cross-sample feature fusion. Together, they broaden the vision for each feature point from the support to a whole augmented feature pyramid from the query, facilitating semantic-aligned associations. Extensive experiments on multiple benchmarks demonstrate the superiority of our framework. Without fine-tuning on novel classes, it brings significant performance gains to one-stage baselines, lifting state-of-the-art results to a higher level.

Cite

Text

Zhao et al. "Semantic-Aligned Fusion Transformer for One-Shot Object Detection." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00745

Markdown

[Zhao et al. "Semantic-Aligned Fusion Transformer for One-Shot Object Detection." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhao2022cvpr-semanticaligned/) doi:10.1109/CVPR52688.2022.00745

BibTeX

@inproceedings{zhao2022cvpr-semanticaligned,
  title     = {{Semantic-Aligned Fusion Transformer for One-Shot Object Detection}},
  author    = {Zhao, Yizhou and Guo, Xun and Lu, Yan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {7601-7611},
  doi       = {10.1109/CVPR52688.2022.00745},
  url       = {https://mlanthology.org/cvpr/2022/zhao2022cvpr-semanticaligned/}
}