A Hardware Prototype Targeting Distributed Deep Learning for On-Device Inference
Abstract
This paper presents a hardware prototype and a framework for a new communication-aware model compression for distributed on-device inference. Our approach relies on Knowledge Distillation (KD) and achieves orders of magnitude compression ratios on a large pre-trained teacher model. The distributed hardware prototype consists of multiple student models deployed on Raspberry-Pi 3 nodes that run Wide ResNet and VGG models on the CIFAR10 dataset for real-time image classification. We observe significant reductions in memory footprint (50×), energy consumption (14×), latency (33×) and an increase in performance (12×) without any significant accuracy loss compared to the initial teacher model. This is an important step towards deploying deep learning models for IoT applications.
Cite
Text
Farcas et al. "A Hardware Prototype Targeting Distributed Deep Learning for On-Device Inference." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00207Markdown
[Farcas et al. "A Hardware Prototype Targeting Distributed Deep Learning for On-Device Inference." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/farcas2020cvprw-hardware/) doi:10.1109/CVPRW50498.2020.00207BibTeX
@inproceedings{farcas2020cvprw-hardware,
title = {{A Hardware Prototype Targeting Distributed Deep Learning for On-Device Inference}},
author = {Farcas, Allen-Jasmin and Li, Guihong and Bhardwaj, Kartikeya and Marculescu, Radu},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2020},
pages = {1600-1601},
doi = {10.1109/CVPRW50498.2020.00207},
url = {https://mlanthology.org/cvprw/2020/farcas2020cvprw-hardware/}
}