QDrop: Randomly Dropping Quantization for Extremely Low-Bit Post-Training Quantization
Abstract
Recently, post-training quantization (PTQ) has driven much attention to produce efficient neural networks without long-time retraining. Despite the low cost, current PTQ works always fail under the extremely low-bit setting. In this study, we pioneeringly confirm that properly incorporating activation quantization into the PTQ reconstruction benefits the final accuracy. To deeply understand the inherent reason, a theoretical framework is established, which inspires us that the flatness of the optimized low-bit model on calibration and test data is crucial. Based on the conclusion, a simple yet effective approach dubbed as \textsc{QDrop} is proposed, which randomly drops the quantization of activations during reconstruction. Extensive experiments on various tasks including computer vision (image classification, object detection) and natural language processing (text classification and question answering) prove its superiority. With \textsc{QDrop}, the limit of PTQ is pushed to the 2-bit activation for the first time and the accuracy boost can be up to 51.49\%. Without bells and whistles, \textsc{QDrop} establishes a new state of the art for PTQ.
Cite
Text
Wei et al. "QDrop: Randomly Dropping Quantization for Extremely Low-Bit Post-Training Quantization." International Conference on Learning Representations, 2022.Markdown
[Wei et al. "QDrop: Randomly Dropping Quantization for Extremely Low-Bit Post-Training Quantization." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/wei2022iclr-qdrop/)BibTeX
@inproceedings{wei2022iclr-qdrop,
title = {{QDrop: Randomly Dropping Quantization for Extremely Low-Bit Post-Training Quantization}},
author = {Wei, Xiuying and Gong, Ruihao and Li, Yuhang and Liu, Xianglong and Yu, Fengwei},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/wei2022iclr-qdrop/}
}