Efficient Deep Learning Inference Based on Model Compression

Qing Zhang, Mengru Zhang, Mengdi Wang, Wanchen Sui, Chen Meng, Jun Yang, Weidan Kong, Xiaoyuan Cui, Wei Lin

CVPRW 2018 pp. 1695-1702

doi:10.1109/CVPRW.2018.00221 /cvprw/2018/zhang2018cvprw-efficient/

Abstract

Deep neural networks (DNNs) have evolved remarkably over the last decade and achieved great success in many machine learning tasks. Along the evolution of deep learning (DL) methods, computational complexity and resource consumption of DL models continue to increase, this makes efficient deployment challenging, especially in devices with low memory resources or in applications with strict latency requirements. In this paper, we will introduce a DL inference optimization pipeline, which consists of a series of model compression methods, including Tensor Decomposition (TD), Graph Adaptive Pruning (GAP), Intrinsic Sparse Structures (ISS) in Long Short-Term Memory (LSTM), Knowledge Distillation (KD) and low-bit model quantization. We use different modeling scenarios to test our inference optimization pipeline with above mentioned methods, and it shows promising results to make inference more efficient with marginal loss of model accuracy.

CVPRW Semantic Scholar

Cite

Text

Zhang et al. "Efficient Deep Learning Inference Based on Model Compression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018. doi:10.1109/CVPRW.2018.00221

Markdown

[Zhang et al. "Efficient Deep Learning Inference Based on Model Compression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.](https://mlanthology.org/cvprw/2018/zhang2018cvprw-efficient/) doi:10.1109/CVPRW.2018.00221

BibTeX

@inproceedings{zhang2018cvprw-efficient,
  title     = {{Efficient Deep Learning Inference Based on Model Compression}},
  author    = {Zhang, Qing and Zhang, Mengru and Wang, Mengdi and Sui, Wanchen and Meng, Chen and Yang, Jun and Kong, Weidan and Cui, Xiaoyuan and Lin, Wei},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2018},
  pages     = {1695-1702},
  doi       = {10.1109/CVPRW.2018.00221},
  url       = {https://mlanthology.org/cvprw/2018/zhang2018cvprw-efficient/}
}