Dissecting Efficient Architectures for Wake-Word Detection

Abstract

Wake-word detection models running on edge devices have stringent efficiency requirements. We observe that over-the-air test accuracy of models trained on parallel devices (GPU/TPU) usually degrades when deployed on edge devices using a CPU for over-the-air, real-time Further, differing inference time when migrating between GPU and CPU varies across models. This drop is due to hardware latency and acoustic impulse response, while non-uniform growth of inference time results from models' varying exploitation of hardware acceleration. We compare five Convolutional Neural Network (CNN) architectures and one pure Transformer architecture, train them for wake-word detection on the Speech Commands dataset, and quantize two representative models. We seek to quantify their accuracy-efficiency tradeoffs to inform researchers and practicioners about the key components in models influencing this tradeoff.

Cite

Text

Berger et al. "Dissecting Efficient Architectures for Wake-Word Detection." ICML 2023 Workshops: ES-FoMO, 2023.

Markdown

[Berger et al. "Dissecting Efficient Architectures for Wake-Word Detection." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/berger2023icmlw-dissecting/)

BibTeX

@inproceedings{berger2023icmlw-dissecting,
  title     = {{Dissecting Efficient Architectures for Wake-Word Detection}},
  author    = {Berger, Cody and Li, Juncheng B and Li, Yiyuan and Berger, Aaron and Berger, Dmitri and Ganesan, Karthik and Strubell, Emma and Metze, Florian},
  booktitle = {ICML 2023 Workshops: ES-FoMO},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/berger2023icmlw-dissecting/}
}