Dissecting Efficient Architectures for Wake-Word Detection
Abstract
Wake-word detection models running on edge devices have stringent efficiency requirements. We observe that over-the-air test accuracy of models trained on parallel devices (GPU/TPU) usually degrades when deployed on edge devices using a CPU for over-the-air, real-time Further, differing inference time when migrating between GPU and CPU varies across models. This drop is due to hardware latency and acoustic impulse response, while non-uniform growth of inference time results from models' varying exploitation of hardware acceleration. We compare five Convolutional Neural Network (CNN) architectures and one pure Transformer architecture, train them for wake-word detection on the Speech Commands dataset, and quantize two representative models. We seek to quantify their accuracy-efficiency tradeoffs to inform researchers and practicioners about the key components in models influencing this tradeoff.
Cite
Text
Berger et al. "Dissecting Efficient Architectures for Wake-Word Detection." ICML 2023 Workshops: ES-FoMO, 2023.Markdown
[Berger et al. "Dissecting Efficient Architectures for Wake-Word Detection." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/berger2023icmlw-dissecting/)BibTeX
@inproceedings{berger2023icmlw-dissecting,
title = {{Dissecting Efficient Architectures for Wake-Word Detection}},
author = {Berger, Cody and Li, Juncheng B and Li, Yiyuan and Berger, Aaron and Berger, Dmitri and Ganesan, Karthik and Strubell, Emma and Metze, Florian},
booktitle = {ICML 2023 Workshops: ES-FoMO},
year = {2023},
url = {https://mlanthology.org/icmlw/2023/berger2023icmlw-dissecting/}
}