Adaptive Loss-Aware Quantization for Multi-Bit Networks

Abstract

We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on low-resource mobile and embedded platforms. We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy. Unlike previous MBN quantization solutions that train a quantizer by minimizing the error to reconstruct full precision weights, ALQ directly minimizes the quantization-induced error on the loss function involving neither gradient approximation nor full precision maintenance. ALQ also exploits strategies including adaptive bitwidth, smooth bitwidth reduction, and iterative trained quantization to allow a smaller network size without loss in accuracy. Experiment results on popular image datasets show that ALQ outperforms state-of-the-art compressed networks in terms of both storage and accuracy.

Cite

Text

Qu et al. "Adaptive Loss-Aware Quantization for Multi-Bit Networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00801

Markdown

[Qu et al. "Adaptive Loss-Aware Quantization for Multi-Bit Networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/qu2020cvpr-adaptive/) doi:10.1109/CVPR42600.2020.00801

BibTeX

@inproceedings{qu2020cvpr-adaptive,
  title     = {{Adaptive Loss-Aware Quantization for Multi-Bit Networks}},
  author    = {Qu, Zhongnan and Zhou, Zimu and Cheng, Yun and Thiele, Lothar},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00801},
  url       = {https://mlanthology.org/cvpr/2020/qu2020cvpr-adaptive/}
}