Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer

Abstract

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. Among the powerful compression approaches, quantization extremely reduces the computation and memory consumption by low-bit parameters and bit-wise operations. However, low-bit ViTs remain largely unexplored and usually suffer from a significant performance drop compared with the real-valued counterparts. In this work, through extensive empirical analysis, we first identify the bottleneck for severe performance drop comes from the information distortion of the low-bit quantized self-attention map. We then develop an information rectification module (IRM) and a distribution guided distillation (DGD) scheme for fully quantized vision transformers (Q-ViT) to effectively eliminate such distortion, leading to a fully quantized ViTs. We evaluate our methods on popular DeiT and Swin backbones. Extensive experimental results show that our method achieves a much better performance than the prior arts. For example, our Q-ViT can theoretically accelerates the ViT-S by 6.14x and achieves about 80.9% Top-1 accuracy, even surpassing the full-precision counterpart by 1.0% on ImageNet dataset. Our codes and models are attached on https://github.com/YanjingLi0202/Q-ViT

Cite

Text

Li et al. "Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer." Neural Information Processing Systems, 2022.

Markdown

[Li et al. "Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/li2022neurips-qvit/)

BibTeX

@inproceedings{li2022neurips-qvit,
  title     = {{Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer}},
  author    = {Li, Yanjing and Xu, Sheng and Zhang, Baochang and Cao, Xianbin and Gao, Peng and Guo, Guodong},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/li2022neurips-qvit/}
}