Beyond English: Offensive Language Detection in Low-Resource Nigerian Languages

Abstract

The proliferation of online offensive language necessitates the development of ef- fective detection mechanisms, especially in multilingual contexts. This study ad- dresses the challenge by developing and introducing novel datasets for hate speech detection in three major Nigerian languages: Hausa, Yoruba, and Igbo. We col- lected data from Twitter and manually annotated it to create datasets for each of the three languages, using native speakers. We used pre-trained language models to evaluate their efficacy in detecting offensive language in our datasets. The best- performing model achieved an accuracy of 90%. To further support research in offensive language detection, we plan to make the dataset and our model publicly available.

Cite

Text

Aliyu et al. "Beyond English: Offensive Language Detection in Low-Resource Nigerian Languages." ICLR 2024 Workshops: AfricaNLP, 2024.

Markdown

[Aliyu et al. "Beyond English: Offensive Language Detection in Low-Resource Nigerian Languages." ICLR 2024 Workshops: AfricaNLP, 2024.](https://mlanthology.org/iclrw/2024/aliyu2024iclrw-beyond/)

BibTeX

@inproceedings{aliyu2024iclrw-beyond,
  title     = {{Beyond English: Offensive Language Detection in Low-Resource Nigerian Languages}},
  author    = {Aliyu, Saminu Mohammad and Wajiga, Gregory Maksha and Murtala, Muhammad and Aliyu, Lukman Jibril},
  booktitle = {ICLR 2024 Workshops: AfricaNLP},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/aliyu2024iclrw-beyond/}
}