Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching

Peisong Wang, Qiang Chen, Xiangyu He, Jian Cheng

ICML 2020 pp. 9847-9856

/icml/2020/wang2020icml-accurate/

Abstract

Network quantization is essential for deploying deep models to IoT devices due to its high efficiency. Most existing quantization approaches rely on the full training datasets and the time-consuming fine-tuning to retain accuracy. Post-training quantization does not have these problems, however, it has mainly been shown effective for 8-bit quantization due to the simple optimization strategy. In this paper, we propose a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation. The proposed framework is validated on a variety of computer vision tasks, including image classification, object detection, instance segmentation, with various network architectures. Specifically, Bit-split can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning.

PDF ICML Semantic Scholar

Cite

Text

Wang et al. "Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching." International Conference on Machine Learning, 2020.

Markdown

[Wang et al. "Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/wang2020icml-accurate/)

BibTeX

@inproceedings{wang2020icml-accurate,
  title     = {{Towards Accurate Post-Training Network Quantization via Bit-Split and Stitching}},
  author    = {Wang, Peisong and Chen, Qiang and He, Xiangyu and Cheng, Jian},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {9847-9856},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/wang2020icml-accurate/}
}