Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition
Abstract
Automatic Speech Recognition (ASR) systems are increasingly deployed across diverse computing environments, from cloud servers to edge devices. While accuracy has traditionally been the primary evaluation metric, the inference efficiency of these systems, including energy consumption, memory usage, and hardware utilisation, significantly impacts their practical usability. This paper introduces a novel benchmarking framework that assesses ASR models during inference from both performance and sustainability perspectives. We introduce a multi-metric evaluation approach quantifying Word Error Rate (WER), Real-Time Factor (RTF), Energy Per Audio Second (EPAS), inference latency, GPU Memory Efficiency (GME), and Hardware Utilisation Rate (HUR). Our framework includes configurable weighting schemes tailored for various deployment scenarios: balanced general-purpose evaluation, resource-constrained environments, high-throughput batch inference, and real-time processing. To demonstrate the utility of the framework, we benchmark several state-of-the-art ASR architectures (Whisper, Wav2Vec2, HuBERT, WavLM, UniSpeech, and SpeechT5) in both FP16 and FP32 precision on NVIDIA Jetson AGX Orin hardware. The proposed methodology supports researchers and practitioners in making informed model selection decisions based on context-specific inference requirements. By illuminating performance–consumption trade-offs, the metrics framework can help to reduce computational costs and the carbon footprint of ASR systems, while maintaining acceptable accuracy.
Cite
Text
Ulan et al. "Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06118-8_3Markdown
[Ulan et al. "Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/ulan2025ecmlpkdd-talk/) doi:10.1007/978-3-032-06118-8_3BibTeX
@inproceedings{ulan2025ecmlpkdd-talk,
title = {{Talk Is Cheap, Energy Is Not: Towards a Green, Context-Aware Metrics Framework for Automatic Speech Recognition}},
author = {Ulan, Maria and Husom, Erik Johannes and Van den Abeele, Jeriek},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2025},
pages = {36-54},
doi = {10.1007/978-3-032-06118-8_3},
url = {https://mlanthology.org/ecmlpkdd/2025/ulan2025ecmlpkdd-talk/}
}