Rethinking Features-Fused-Pyramid-Neck for Object Detection

Abstract

Multi-head detectors typically employ a features-fused-pyramid-neck for multi-scale detection and are widely adopted in the industry. However, this approach faces feature misalignment when representations from different hierarchical levels of the feature pyramid are forcibly fused point-to-point. To address this issue, we designed an independent hierarchy pyramid (IHP) architecture to evaluate the effectiveness of the features-unfused-pyramid-neck for multi-head detectors. Subsequently, we introduced soft nearest neighbor interpolation (SNI) with a weight-downscaling factor to mitigate the impact of feature fusion at different hierarchies while preserving key textures. Furthermore, we present a feature adaptive selection method for downsampling in extended spatial windows (ESD) to retain spatial features and enhance lightweight convolutional techniques (GSConvE). These advancements culminate in our secondary features alignment solution (SA) for real-time detection, achieving state-of-the-art results on Pascal VOC and MS COCO. Code will be released at https://github.com/AlanLi1997/rethinking-fpn.

Cite

Text

Li. "Rethinking Features-Fused-Pyramid-Neck for Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72855-6_5

Markdown

[Li. "Rethinking Features-Fused-Pyramid-Neck for Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/li2024eccv-rethinking/) doi:10.1007/978-3-031-72855-6_5

BibTeX

@inproceedings{li2024eccv-rethinking,
  title     = {{Rethinking Features-Fused-Pyramid-Neck for Object Detection}},
  author    = {Li, Hulin},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72855-6_5},
  url       = {https://mlanthology.org/eccv/2024/li2024eccv-rethinking/}
}