Igbosum1500 - Introducing the Igbo Text Summarization Dataset

Abstract

Igbo, along with Hausa and Yor`ub´a, is one of the three prominent indigenous Nigerian languages. It is spoken by the Igbos of southeastern Nigeria with over 30 million speakers resident in Nigeria and many more abroad. In NLP terms, Igbo is still considered to be acutely under-resourced and ‘scraping-by’ according to Joshi et al. (2020). Currently, efforts are ongoing in developing IgboNLP e.g. part-of-speech tagging (Onyenwe et al., 2019), diacritic restoration (Ezeani et al.,2016), embedding based analogy and similarity (Ezeani et al., 2018), machine translation (Ezeani et al., 2020), (Nekoto et al., 2020), and named-entity recognition (Adelani et al., 2021). However, these efforts need to be sustained by creating more resources and expanding the scope of coverage of common downstream NLP tasks in Igbo, and one of such tasks is text summarization.

Cite

Text

Mbonu et al. "Igbosum1500 - Introducing the Igbo Text Summarization Dataset." ICLR 2022 Workshops: AfricaNLP, 2022.

Markdown

[Mbonu et al. "Igbosum1500 - Introducing the Igbo Text Summarization Dataset." ICLR 2022 Workshops: AfricaNLP, 2022.](https://mlanthology.org/iclrw/2022/mbonu2022iclrw-igbosum1500/)

BibTeX

@inproceedings{mbonu2022iclrw-igbosum1500,
  title     = {{Igbosum1500 - Introducing the Igbo Text Summarization Dataset}},
  author    = {Mbonu, Chinedu Emmanuel and Chukwuneke, Chiamaka Ijeoma and Paul, Roseline Uzoamaka and Ezeani, Ignatius and Onyenwe, Ikechukwu},
  booktitle = {ICLR 2022 Workshops: AfricaNLP},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/mbonu2022iclrw-igbosum1500/}
}