African Substrates Rather than European Lexifiers to Augment African-Diaspora Creole Translation
Abstract
Machine translation (MT) model training is difficult for low-resource languages, such African-diaspora Creole languages, because of data scarcity. Cross-lingual data augmentation methods with knowledge transfer from related high-resource languages are a common technique to overcome this disadvantage. For instance, practitioners may transfer knowledge from a language in the same language family as the low-resource language of interest. African-diaspora Creole languages are low-resource and have simultaneous relationships with multiple language groups. These languages, such as Haitian and Jamaican, are typically lexified by colonial European languages, but they are structurally similar to African languages. We explore the advantages of transferring knowledge from the European lexifier language versus the phylogenetic and typological relatives of the African substrate languages. We analysed Haitian and Jamaican MT: both controlling tightly for data properties across compared transfer languages and later allowing use of all data we collected. Our inquiry demonstrates a significant advantage in using African transfer languages in some settings.
Cite
Text
Robinson et al. "African Substrates Rather than European Lexifiers to Augment African-Diaspora Creole Translation." ICLR 2023 Workshops: AfricaNLP, 2023.Markdown
[Robinson et al. "African Substrates Rather than European Lexifiers to Augment African-Diaspora Creole Translation." ICLR 2023 Workshops: AfricaNLP, 2023.](https://mlanthology.org/iclrw/2023/robinson2023iclrw-african/)BibTeX
@inproceedings{robinson2023iclrw-african,
title = {{African Substrates Rather than European Lexifiers to Augment African-Diaspora Creole Translation}},
author = {Robinson, Nathaniel Romney and Stutzman, Matthew Dean and Richardson, Stephen D. and Mortensen, David R},
booktitle = {ICLR 2023 Workshops: AfricaNLP},
year = {2023},
url = {https://mlanthology.org/iclrw/2023/robinson2023iclrw-african/}
}