Breaking the Low-Resource Barrier for Dagbani ASR: From Data Collection to Modeling
Abstract
Developing Automatic Speech Recognition (ASR) systems requires large amounts of high-quality speech data. However, for low-resourced African languages, collecting and annotating such data is challenging due to acute data scarcity and limited funding. As a result, building ASR technologies for these languages remains a daunting task. This paper addresses this challenge for Dagbani by presenting a data collection pipeline and process for a transcribed Dagbani audio dataset. Dagbani is an African language spoken predominantly in Ghana and in parts of northern Togo. We then apply the data to build the world’s first Automatic Speech Recognition (ASR) system for Dagbani. We hope this methodology can serve as a blueprint or guideline for other similar efforts.
Cite
Text
Azunre and Ibrahim. "Breaking the Low-Resource Barrier for Dagbani ASR: From Data Collection to Modeling." ICLR 2023 Workshops: AfricaNLP, 2023.Markdown
[Azunre and Ibrahim. "Breaking the Low-Resource Barrier for Dagbani ASR: From Data Collection to Modeling." ICLR 2023 Workshops: AfricaNLP, 2023.](https://mlanthology.org/iclrw/2023/azunre2023iclrw-breaking/)BibTeX
@inproceedings{azunre2023iclrw-breaking,
title = {{Breaking the Low-Resource Barrier for Dagbani ASR: From Data Collection to Modeling}},
author = {Azunre, Paul and Ibrahim, Naafi Dasana},
booktitle = {ICLR 2023 Workshops: AfricaNLP},
year = {2023},
url = {https://mlanthology.org/iclrw/2023/azunre2023iclrw-breaking/}
}