"Allot?" Is "a Lot!" Towards Developing More Generalized Speech Recognition System for Accessible Communication

Abstract

The proliferation of Automatic Speech Recognition (ASR) systems has revolutionized translation and transcription. However, challenges persist in ensuring inclusive communication for non-native English speakers. This study quantifies the gap between accented and native English speech using Wav2Vec 2.0, a state-of-the-art transformer model. Notably, we found that accented speech exhibits significantly higher word error rates of 30-50%, in contrast to native speakers’ 2-8% (Baevski et al. 2020). Our exploration extends to leveraging accessible online datasets to highlight the potential of enhancing speech recognition by fine-tuning the Wav2Vec 2.0 model. Through experimentation and analysis, we highlight the challenges with training models on accented speech. By refining models and addressing data quality issues, our work presents a pipeline for future investigations aimed at developing an integrated system capable of effectively engaging with a broader range of individuals with diverse backgrounds. Accurate recognition of accented speech is a pivotal step toward democratizing AI-driven communication products.

Cite

Text

Bandodkar et al. ""Allot?" Is "a Lot!" Towards Developing More Generalized Speech Recognition System for Accessible Communication." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I21.30381

Markdown

[Bandodkar et al. ""Allot?" Is "a Lot!" Towards Developing More Generalized Speech Recognition System for Accessible Communication." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/bandodkar2024aaai-allot/) doi:10.1609/AAAI.V38I21.30381

BibTeX

@inproceedings{bandodkar2024aaai-allot,
  title     = {{"Allot?" Is "a Lot!" Towards Developing More Generalized Speech Recognition System for Accessible Communication}},
  author    = {Bandodkar, Grisha and Agarwal, Shyam and Sughosh, Athul Krishna and Singh, Sahilbir and Choi, Taeyeong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {23327-23334},
  doi       = {10.1609/AAAI.V38I21.30381},
  url       = {https://mlanthology.org/aaai/2024/bandodkar2024aaai-allot/}
}