UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units
Abstract
Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and sparsity to fit more parameters in the same footprint. However, designing compressible neural networks (NNs) is challenging, as it expands the design space across which we must make balanced trade-offs. This paper demonstrates Unified DNAS for Compressible (UDC) NNs, which explores a large search space to generate state-of-the-art compressible NNs for NPU. ImageNet results show UDC networks are up to 3.35x smaller (iso-accuracy) or 6.25% more accurate (iso-model size) than previous work.
Cite
Text
Fedorov et al. "UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units." Neural Information Processing Systems, 2022.Markdown
[Fedorov et al. "UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/fedorov2022neurips-udc/)BibTeX
@inproceedings{fedorov2022neurips-udc,
title = {{UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units}},
author = {Fedorov, Igor and Matas, Ramon and Tann, Hokchhay and Zhou, Chuteng and Mattina, Matthew and Whatmough, Paul},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/fedorov2022neurips-udc/}
}