Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

Abstract

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference, facilitating the use of DNNs on edge computing platforms. Recent efforts at quantizing DNNs have employed a range of techniques encompassing progressive quantization, step-size adaptation, and gradient scaling. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing. Our method establishes a new pareto frontier in model accuracy and memory footprint demonstrating a range of pre-trained quantized models, delivering best-in-class accuracy below 4.3 MB of weights and activations without modifying the model architecture. Our main contributions are: (i) a method for tensor-sliced learned precision with a hardware-aware cost function for heterogeneous differentiable quantization, (ii) targeted gradient modification for weights and activations to mitigate quantization errors, and (iii) a multi-phase learning schedule to address instability in learning arising from updates to the learned quantizer and model parameters. We demonstrate the effectiveness of our techniques on the ImageNet dataset across a range of models including EfficientNet-Lite0 (e.g., 4.14MB of weights and activations at 67.66% accuracy) and MobileNetV2 (e.g., 3.51MB weights and activations at 65.39% accuracy).

Cite

Text

Schaefer et al. "Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Schaefer et al. "Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/schaefer2024wacv-edge/)

BibTeX

@inproceedings{schaefer2024wacv-edge,
  title     = {{Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks}},
  author    = {Schaefer, Clemens JS and Joshi, Siddharth and Li, Shan and Blazquez, Raul},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {8460-8469},
  url       = {https://mlanthology.org/wacv/2024/schaefer2024wacv-edge/}
}