Re-Evaluating Minimum Bayes Risk Decoding for Automated Speech Recognition Tasks

Abstract

While sample-based Minimum Bayes Risk (MBR) decoding has shown to outperform beam search in many text-to-text generation tasks with modern LLMs, beam search remains the dominant approach for Automatic Speech Recognition (ASR) and Speech Translation (ST). To date, the efficacy of MBR decoding within modern speech systems lacks comprehensive evaluation. Given that MBR decoding is effective in text-to-text generation tasks, it is reasonable to expect it to also be effective for speech-to-text tasks. In this paper, we evaluate MBR decoding for ASR and ST tasks on English and Japanese using Whisper and its derivative models, as well as supplementary autoregressive baselines. We observe that the accuracy of MBR decoding outperforms that of beam search in most of the experimental settings we have evaluated. The results show that MBR decoding is a promising method for ASR and ST tasks that require high accuracy.

Cite

Text

Jinnai. "Re-Evaluating Minimum Bayes Risk Decoding for Automated Speech Recognition Tasks." Transactions on Machine Learning Research, 2026.

Markdown

[Jinnai. "Re-Evaluating Minimum Bayes Risk Decoding for Automated Speech Recognition Tasks." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/jinnai2026tmlr-reevaluating/)

BibTeX

@article{jinnai2026tmlr-reevaluating,
  title     = {{Re-Evaluating Minimum Bayes Risk Decoding for Automated Speech Recognition Tasks}},
  author    = {Jinnai, Yuu},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/jinnai2026tmlr-reevaluating/}
}