Adaptive Loss-Aware Quantization for Multi-Bit Networks
Abstract
We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on low-resource mobile and embedded platforms. We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy. Unlike previous MBN quantization solutions that train a quantizer by minimizing the error to reconstruct full precision weights, ALQ directly minimizes the quantization-induced error on the loss function involving neither gradient approximation nor full precision maintenance. ALQ also exploits strategies including adaptive bitwidth, smooth bitwidth reduction, and iterative trained quantization to allow a smaller network size without loss in accuracy. Experiment results on popular image datasets show that ALQ outperforms state-of-the-art compressed networks in terms of both storage and accuracy.
Cite
Text
Qu et al. "Adaptive Loss-Aware Quantization for Multi-Bit Networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00801Markdown
[Qu et al. "Adaptive Loss-Aware Quantization for Multi-Bit Networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/qu2020cvpr-adaptive/) doi:10.1109/CVPR42600.2020.00801BibTeX
@inproceedings{qu2020cvpr-adaptive,
title = {{Adaptive Loss-Aware Quantization for Multi-Bit Networks}},
author = {Qu, Zhongnan and Zhou, Zimu and Cheng, Yun and Thiele, Lothar},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2020},
doi = {10.1109/CVPR42600.2020.00801},
url = {https://mlanthology.org/cvpr/2020/qu2020cvpr-adaptive/}
}