Q-DETR: An Efficient Low-Bit Quantized Detection Transformer

Abstract

The recent detection transformer (DETR) has advanced object detection, but its application on resource-constrained devices requires massive computation and memory resources. Quantization stands out as a solution by representing the network in low-bit parameters and operations. However, there is a significant performance drop when performing low-bit quantized DETR (Q-DETR) with existing quantization methods. We find that the bottlenecks of Q-DETR come from the query information distortion through our empirical analyses. This paper addresses this problem based on a distribution rectification distillation (DRD). We formulate our DRD as a bi-level optimization problem, which can be derived by generalizing the information bottleneck (IB) principle to the learning of Q-DETR. At the inner level, we conduct a distribution alignment for the queries to maximize the self-information entropy. At the upper level, we introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy. Extensive experimental results show that our method performs much better than prior arts. For example, the 4-bit Q-DETR can theoretically accelerate DETR with ResNet-50 backbone by 6.6x and achieve 39.4% AP, with only 2.6% performance gaps than its real-valued counterpart on the COCO dataset.

Cite

Text

Xu et al. "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00374

Markdown

[Xu et al. "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/xu2023cvpr-qdetr/) doi:10.1109/CVPR52729.2023.00374

BibTeX

@inproceedings{xu2023cvpr-qdetr,
  title     = {{Q-DETR: An Efficient Low-Bit Quantized Detection Transformer}},
  author    = {Xu, Sheng and Li, Yanjing and Lin, Mingbao and Gao, Peng and Guo, Guodong and Lü, Jinhu and Zhang, Baochang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {3842-3851},
  doi       = {10.1109/CVPR52729.2023.00374},
  url       = {https://mlanthology.org/cvpr/2023/xu2023cvpr-qdetr/}
}