Fast Convergence of DETR with Spatially Modulated Co-Attention

Abstract

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN. However, DETR suffers from its slow convergence. Training DETR from scratch needs 500 epochs to achieve a high accuracy. To accelerate its convergence, we propose a simple yet effective scheme for improving the DETR framework, namely Spatially Modulated Co-Attention (SMCA) mechanism. The core idea of SMCA is to conduct location-aware co-attention in DETR by constraining co-attention responses to be high near initially estimated bounding box locations. Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder while keeping other operations in DETR unchanged. Furthermore, by integrating multi-head and scale-selection attention designs into SMCA, our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone (45.6 mAP at 108 epochs vs. 43.3 mAP at 500 epochs). We perform extensive ablation studies on COCO dataset to validate SMCA. Code is released at https://github.com/gaopengcuhk/SMCA-DETR.

Cite

Text

Gao et al. "Fast Convergence of DETR with Spatially Modulated Co-Attention." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00360

Markdown

[Gao et al. "Fast Convergence of DETR with Spatially Modulated Co-Attention." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/gao2021iccv-fast-a/) doi:10.1109/ICCV48922.2021.00360

BibTeX

@inproceedings{gao2021iccv-fast-a,
  title     = {{Fast Convergence of DETR with Spatially Modulated Co-Attention}},
  author    = {Gao, Peng and Zheng, Minghang and Wang, Xiaogang and Dai, Jifeng and Li, Hongsheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {3621-3630},
  doi       = {10.1109/ICCV48922.2021.00360},
  url       = {https://mlanthology.org/iccv/2021/gao2021iccv-fast-a/}
}