Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching

Abstract

Network quantization is essential for deploying deep models to IoT devices due to its high efficiency. Most existing quantization approaches rely on the full training datasets and the time-consuming fine-tuning to retain accuracy. Post-training quantization does not have these problems, however, it has mainly been shown effective for 8-bit quantization due to the simple optimization strategy. In this paper, we propose a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation. The proposed framework is validated on a variety of computer vision tasks, including image classification, object detection, instance segmentation, with various network architectures. Specifically, Bit-split can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning.

Cite

Text

Wang et al. "Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching." International Conference on Machine Learning, 2020.

Markdown

[Wang et al. "Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/wang2020icml-accurate/)

BibTeX

@inproceedings{wang2020icml-accurate,
  title     = {{Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching}},
  author    = {Wang, Peisong and Chen, Qiang and He, Xiangyu and Cheng, Jian},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {9847-9856},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/wang2020icml-accurate/}
}