FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Abstract

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.

Cite

Text

Yang and Jin. "FracBits: Mixed Precision Quantization via Fractional Bit-Widths." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I12.17269

Markdown

[Yang and Jin. "FracBits: Mixed Precision Quantization via Fractional Bit-Widths." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/yang2021aaai-fracbits/) doi:10.1609/AAAI.V35I12.17269

BibTeX

@inproceedings{yang2021aaai-fracbits,
  title     = {{FracBits: Mixed Precision Quantization via Fractional Bit-Widths}},
  author    = {Yang, Linjie and Jin, Qing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {10612-10620},
  doi       = {10.1609/AAAI.V35I12.17269},
  url       = {https://mlanthology.org/aaai/2021/yang2021aaai-fracbits/}
}