End-to-End Object Detection with Fully Convolutional Network

Abstract

Mainstream object detectors based on the fully convolutional network has achieved impressive performance. While most of them still need a hand-designed non-maximum suppression (NMS) post-processing, which impedes fully end-to-end training. In this paper, we give the analysis of discarding NMS, where the results reveal that a proper label assignment plays a crucial role. To this end, for fully convolutional detectors, we introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection, which obtains comparable performance with NMS. Besides, a simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region. With these techniques, our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets. The code is available at https://github.com/Megvii-BaseDetection/DeFCN.

Cite

Text

Wang et al. "End-to-End Object Detection with Fully Convolutional Network." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.01559

Markdown

[Wang et al. "End-to-End Object Detection with Fully Convolutional Network." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/wang2021cvpr-endtoend/) doi:10.1109/CVPR46437.2021.01559

BibTeX

@inproceedings{wang2021cvpr-endtoend,
  title     = {{End-to-End Object Detection with Fully Convolutional Network}},
  author    = {Wang, Jianfeng and Song, Lin and Li, Zeming and Sun, Hongbin and Sun, Jian and Zheng, Nanning},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {15849-15858},
  doi       = {10.1109/CVPR46437.2021.01559},
  url       = {https://mlanthology.org/cvpr/2021/wang2021cvpr-endtoend/}
}