YAD: Leveraging T5 for Improved Automatic Diacritization of Yorùbá Text
Abstract
In this work we present Yorùbá automatic diacritization (YAD) benchmark dataset for evaluating Yorùbá diacritization systems. In addition, we pre-train text-to-text transformer, T5 model for Yorùbá and showed that this model outperform several multilingually trained T5 models. Lastly, we showed that more data and bigger models are better at diacritization for Yorùbá
Cite
Text
Olawole et al. "YAD: Leveraging T5 for Improved Automatic Diacritization of Yorùbá Text." ICLR 2024 Workshops: AfricaNLP, 2024.Markdown
[Olawole et al. "YAD: Leveraging T5 for Improved Automatic Diacritization of Yorùbá Text." ICLR 2024 Workshops: AfricaNLP, 2024.](https://mlanthology.org/iclrw/2024/olawole2024iclrw-yad/)BibTeX
@inproceedings{olawole2024iclrw-yad,
title = {{YAD: Leveraging T5 for Improved Automatic Diacritization of Yorùbá Text}},
author = {Olawole, Akindele Michael and Alabi, Jesujoba Oluwadara and Sakpere, Aderonke Busayo and Adelani, David Ifeoluwa},
booktitle = {ICLR 2024 Workshops: AfricaNLP},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/olawole2024iclrw-yad/}
}