Boosting Deep Neural Network Efficiency with Dual-Module Inference

Abstract

Using deep neural networks (DNNs) in machine learning tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements and energy constraints because of the memory-bound and the compute-bound execution pattern of DNNs. We propose a big-little dual-module inference to dynamically skip unnecessary memory accesses and computations to accelerate DNN inference. Leveraging the noise-resilient feature of nonlinear activation functions, we propose to use a lightweight little module that approximates the original DNN layer, termed as the big module, to compute activations of the insensitive region that are more noise-resilient. Hence, the expensive memory accesses and computations of the big module can be reduced as the results are only calculated in the sensitive region. For memory-bound models such as recurrent neural networks (RNNs), our method can reduce the overall memory accesses by 40% on average and achieve 1.54x to 1.75x speedup on a commodity CPU-based server platform with a negligible impact on model quality. In addition, our method can reduce the operations of the compute-bound models such as convolutional neural networks (CNNs) by 3.02x, with only a 0.5% accuracy drop.

Cite

Text

Liu et al. "Boosting Deep Neural Network Efficiency with Dual-Module Inference." International Conference on Machine Learning, 2020.

Markdown

[Liu et al. "Boosting Deep Neural Network Efficiency with Dual-Module Inference." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/liu2020icml-boosting/)

BibTeX

@inproceedings{liu2020icml-boosting,
  title     = {{Boosting Deep Neural Network Efficiency with Dual-Module Inference}},
  author    = {Liu, Liu and Deng, Lei and Chen, Zhaodong and Wang, Yuke and Li, Shuangchen and Zhang, Jingwei and Yang, Yihua and Gu, Zhenyu and Ding, Yufei and Xie, Yuan},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {6205-6215},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/liu2020icml-boosting/}
}