Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective

Abstract

With the transformative impact of the Transformer DETR pioneered the application of the encoder-decoder architecture to object detection. A collection of follow-up research e.g. Deformable DETR aims to enhance DETR while adhering to the encoder-decoder design. In this work we revisit the DETR series through the lens of Faster R-CNN. We find that the DETR resonates with the underlying principles of Faster R-CNN's RPN-refiner design but benefits from end-to-end detection owing to the incorporation of Hungarian matching. We systematically adapt the Faster R-CNN towards the Deformable DETR by integrating or repurposing each component of Deformable DETR and note that Deformable DETR's improved performance over Faster R-CNN is attributed to the adoption of advanced modules such as a superior proposal refiner (e.g. deformable attention rather than RoI Align). When viewing the DETR through the RPN-refiner paradigm we delve into various proposal refinement techniques such as deformable attention cross attention and dynamic convolution. These proposal refiners cooperate well with each other; thus we synergistically combine them to establish a Hybrid Proposal Refiner (HPR). Our HPR is versatile and can be incorporated into various DETR detectors. For instance by integrating HPR to a strong DETR detector we achieve an AP of 54.9 on the COCO benchmark utilizing a ResNet-50 backbone and a 36-epoch training schedule. Code and models are available at https://github.com/ZhaoJingjing713/HPR.

Cite

Text

Zhao et al. "Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01649

Markdown

[Zhao et al. "Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhao2024cvpr-hybrid/) doi:10.1109/CVPR52733.2024.01649

BibTeX

@inproceedings{zhao2024cvpr-hybrid,
  title     = {{Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective}},
  author    = {Zhao, Jinjing and Wei, Fangyun and Xu, Chang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {17416-17426},
  doi       = {10.1109/CVPR52733.2024.01649},
  url       = {https://mlanthology.org/cvpr/2024/zhao2024cvpr-hybrid/}
}