Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition

Ulan, Maria; Husom, Erik Johannes; Van den Abeele, Jeriek

doi:10.1007/978-3-032-06118-8_3

Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition

Maria Ulan, Erik Johannes Husom, Jeriek Van den Abeele

ECML-PKDD 2025 pp. 36-54

doi:10.1007/978-3-032-06118-8_3 /ecmlpkdd/2025/ulan2025ecmlpkdd-talk/

Abstract

Automatic Speech Recognition (ASR) systems are increasingly deployed across diverse computing environments, from cloud servers to edge devices. While accuracy has traditionally been the primary evaluation metric, the inference efficiency of these systems, including energy consumption, memory usage, and hardware utilisation, significantly impacts their practical usability. This paper introduces a novel benchmarking framework that assesses ASR models during inference from both performance and sustainability perspectives. We introduce a multi-metric evaluation approach quantifying Word Error Rate (WER), Real-Time Factor (RTF), Energy Per Audio Second (EPAS), inference latency, GPU Memory Efficiency (GME), and Hardware Utilisation Rate (HUR). Our framework includes configurable weighting schemes tailored for various deployment scenarios: balanced general-purpose evaluation, resource-constrained environments, high-throughput batch inference, and real-time processing. To demonstrate the utility of the framework, we benchmark several state-of-the-art ASR architectures (Whisper, Wav2Vec2, HuBERT, WavLM, UniSpeech, and SpeechT5) in both FP16 and FP32 precision on NVIDIA Jetson AGX Orin hardware. The proposed methodology supports researchers and practitioners in making informed model selection decisions based on context-specific inference requirements. By illuminating performance–consumption trade-offs, the metrics framework can help to reduce computational costs and the carbon footprint of ASR systems, while maintaining acceptable accuracy.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Ulan et al. "Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06118-8_3

Markdown

[Ulan et al. "Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/ulan2025ecmlpkdd-talk/) doi:10.1007/978-3-032-06118-8_3

BibTeX

@inproceedings{ulan2025ecmlpkdd-talk,
  title     = {{Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition}},
  author    = {Ulan, Maria and Husom, Erik Johannes and Van den Abeele, Jeriek},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {36-54},
  doi       = {10.1007/978-3-032-06118-8_3},
  url       = {https://mlanthology.org/ecmlpkdd/2025/ulan2025ecmlpkdd-talk/}
}