Multilingual Model and Data Resources for Text-to-Speech in Ugandan Languages

Abstract

We present new resources for text-to-speech in Ugandan languages. Studio-grade recordings in Luganda and English were captured for 2,413 and 2,437 utterances respectively (totaling 4,850 utterances representing 5 hours of speech). We show that this is sufficient to train high-quality TTS models which can generate natural sounding speech in either language or combinations of both with code switching. We also present results on training TTS in Luganda using crowdsourced recordings from Common Voice. Additional data collection is currently underway for the Acholi, Ateso, Lugbara and Runyankole languages. The data we describe is an extension to the SALT dataset, which already contains multi-way parallel translated text in six languages. The dataset and models described are publicly available at https://github.com/SunbirdAI/salt.

Cite

Text

Owomugisha et al. "Multilingual Model and Data Resources for Text-to-Speech in Ugandan Languages." ICLR 2023 Workshops: AfricaNLP, 2023.

Markdown

[Owomugisha et al. "Multilingual Model and Data Resources for Text-to-Speech in Ugandan Languages." ICLR 2023 Workshops: AfricaNLP, 2023.](https://mlanthology.org/iclrw/2023/owomugisha2023iclrw-multilingual/)

BibTeX

@inproceedings{owomugisha2023iclrw-multilingual,
  title     = {{Multilingual Model and Data Resources for Text-to-Speech in Ugandan Languages}},
  author    = {Owomugisha, Isaac and Akera, Benjamin and Mwebaze, Ernest Tonny and Quinn, John},
  booktitle = {ICLR 2023 Workshops: AfricaNLP},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/owomugisha2023iclrw-multilingual/}
}