NaijaRC: A Multi-Choice Reading Comprehension Dataset for Nigerian Languages

Abstract

In this paper, we create NaijaRC— a new multi-choice Nigerian Reading Comprehension dataset that is based on high-school RC examination for three Nigerian national languages: Hausa (hau), Igbo (ibo), and \yoruba (yor). We provide baseline results by performing cross-lingual transfer using the Belebele training data which is majorly from RACE RACE is based on English exams for middle and high school Chinese students, very similar to our dataset. dataset based on several pre-trained encoder-only models. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.

Cite

Text

Aremu et al. "NaijaRC: A Multi-Choice Reading Comprehension Dataset for Nigerian Languages." ICLR 2024 Workshops: AfricaNLP, 2024.

Markdown

[Aremu et al. "NaijaRC: A Multi-Choice Reading Comprehension Dataset for Nigerian Languages." ICLR 2024 Workshops: AfricaNLP, 2024.](https://mlanthology.org/iclrw/2024/aremu2024iclrw-naijarc/)

BibTeX

@inproceedings{aremu2024iclrw-naijarc,
  title     = {{NaijaRC: A Multi-Choice Reading Comprehension Dataset for Nigerian Languages}},
  author    = {Aremu, Anuoluwapo and Alabi, Jesujoba Oluwadara and Abolade, Daud and Aguobi, Nkechinyere Faith and Muhammad, Shamsuddeen Hassan and Adelani, David Ifeoluwa},
  booktitle = {ICLR 2024 Workshops: AfricaNLP},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/aremu2024iclrw-naijarc/}
}