Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition

Abstract

Uncertainty estimation has been widely applied for trustworthy automatic speech recognition (ASR) systems across training and inference stages. In the training stage, previous studies show that uncertainty can facilitate self-training by filtering out unlabeled data samples with high uncertainty. However, the current sequence-level uncertainty estimation method for connectionist temporal classification (CTC) based ASR models drops the output probability information and depends only on the textual distance of decoded predictions. In this study, we argue that this results in limited performance improvement and propose a novel output probability-based sequence-level uncertainty estimation method. We also categorize uncertainty as pseudo-label uncertainty and in-training uncertainty for the self-training process. Finally, we present uncertainty-aware self-training for CTC-based ASR models and experimentally show the effectiveness of the proposed method compared to the baselines.

Cite

Text

Kim and Lee. "Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I23.34610

Markdown

[Kim and Lee. "Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/kim2025aaai-uncertainty/) doi:10.1609/AAAI.V39I23.34610

BibTeX

@inproceedings{kim2025aaai-uncertainty,
  title     = {{Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition}},
  author    = {Kim, Eungbeom and Lee, Kyogu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {24330-24338},
  doi       = {10.1609/AAAI.V39I23.34610},
  url       = {https://mlanthology.org/aaai/2025/kim2025aaai-uncertainty/}
}