Machine Translation for African Languages: Community Creation of Datasets and Models in Uganda
Abstract
Reliable machine translation systems are only available for a small proportion of the world’s languages, the key limitation being a shortage of training and evaluation data. We provide a case study in the creation of such resources by NLP teams who are local to the communities in which these languages are spoken. A parallel text corpus, SALT, was created for five Ugandan languages (Luganda, Runyankole, Acholi, Lugbara and Ateso) and various methods were explored to train and evaluate translation models. The resulting models were found to be effective for practical translation applications, even for those languages with no previous NLP data available, achieving mean BLEU score of 26.2 for translations to English, and 19.9 from English. The SALT dataset and models described are publicly available at https://github.com/SunbirdAI/salt.
Cite
Text
Akera et al. "Machine Translation for African Languages: Community Creation of Datasets and Models in Uganda." ICLR 2022 Workshops: AfricaNLP, 2022.Markdown
[Akera et al. "Machine Translation for African Languages: Community Creation of Datasets and Models in Uganda." ICLR 2022 Workshops: AfricaNLP, 2022.](https://mlanthology.org/iclrw/2022/akera2022iclrw-machine/)BibTeX
@inproceedings{akera2022iclrw-machine,
title = {{Machine Translation for African Languages: Community Creation of Datasets and Models in Uganda}},
author = {Akera, Benjamin and Mukiibi, Jonathan and Naggayi, Lydia Sanyu and Babirye, Claire and Owomugisha, Isaac and Nsumba, Solomon and Nakatumba-Nabende, Joyce and Bainomugisha, Engineer and Mwebaze, Ernest and Quinn, John},
booktitle = {ICLR 2022 Workshops: AfricaNLP},
year = {2022},
url = {https://mlanthology.org/iclrw/2022/akera2022iclrw-machine/}
}