Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Li, Yanjing; Xu, Sheng; Lin, Mingbao; Cao, Xianbin; Liu, Chuanjian; Sun, Xiao; Zhang, Baochang

doi:10.1609/AAAI.V38I4.28109

Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Yanjing Li, Sheng Xu, Mingbao Lin, Xianbin Cao, Chuanjian Liu, Xiao Sun, Baochang Zhang

AAAI 2024 pp. 3243-3251

doi:10.1609/AAAI.V38I4.28109 /aaai/2024/li2024aaai-bi/

Abstract

Vision transformers (ViTs) quantization offers a promising prospect to facilitate deploying large pre-trained networks on resource-limited devices. Fully-binarized ViTs (Bi-ViT) that pushes the quantization of ViTs to its limit remain largely unexplored and a very challenging task yet, due to their unacceptable performance. Through extensive empirical analyses, we identify the severe drop in ViT binarization is caused by attention distortion in self-attention, which technically stems from the gradient vanishing and ranking disorder. To address these issues, we first introduce a learnable scaling factor to reactivate the vanished gradients and illustrate its effectiveness through theoretical and experimental analyses. We then propose a ranking-aware distillation method to rectify the disordered ranking in a teacher-student framework. Bi-ViT achieves significant improvements over popular DeiT and Swin backbones in terms of Top-1 accuracy and FLOPs. For example, with DeiT-Tiny and Swin-Tiny, our method significantly outperforms baselines by 22.1% and 21.4% respectively, while 61.5x and 56.1x theoretical acceleration in terms of FLOPs compared with real-valued counterparts on ImageNet. Our codes and models are attached on https://github.com/YanjingLi0202/Bi-ViT/ .

PDF AAAI Semantic Scholar

Cite

Text

Li et al. "Bi-ViT: Pushing the Limit of Vision Transformer Quantization." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I4.28109

Markdown

[Li et al. "Bi-ViT: Pushing the Limit of Vision Transformer Quantization." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/li2024aaai-bi/) doi:10.1609/AAAI.V38I4.28109

BibTeX

@inproceedings{li2024aaai-bi,
  title     = {{Bi-ViT: Pushing the Limit of Vision Transformer Quantization}},
  author    = {Li, Yanjing and Xu, Sheng and Lin, Mingbao and Cao, Xianbin and Liu, Chuanjian and Sun, Xiao and Zhang, Baochang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {3243-3251},
  doi       = {10.1609/AAAI.V38I4.28109},
  url       = {https://mlanthology.org/aaai/2024/li2024aaai-bi/}
}