Igboner 2.0: Expanding Named Entity Recognition Datasets via Projection

Abstract

Since the inception of the state-of-the-art neural network models for natural language processing research, the major challenge faced by low-resource languages is the lack or insufficiency of annotated training data. The named entity recognition (NER) task is no exception. The need for an efficient data creation and annotation process, especially for low-resource languages cannot be over-emphasized. In this work, we leverage an existing NER tool for English in a cross-language projection method that automatically creates a mapping dictionary of entities in a source language and their translations in the target language using a parallel English-Igbo corpus. The resultant mapping dictionary, which was manually checked and corrected by human annotators, was used to automatically generate and format an NER training dataset from the Igbo monolingual corpus thereby saving a lot of annotation time for the Igbo NER task. The generated dataset was also included in the training process and our experiments show improved performance results from previous works

Cite

Text

Chukwuneke et al. "Igboner 2.0: Expanding Named Entity Recognition Datasets via Projection." ICLR 2023 Workshops: AfricaNLP, 2023.

Markdown

[Chukwuneke et al. "Igboner 2.0: Expanding Named Entity Recognition Datasets via Projection." ICLR 2023 Workshops: AfricaNLP, 2023.](https://mlanthology.org/iclrw/2023/chukwuneke2023iclrw-igboner/)

BibTeX

@inproceedings{chukwuneke2023iclrw-igboner,
  title     = {{Igboner 2.0: Expanding Named Entity Recognition Datasets via Projection}},
  author    = {Chukwuneke, Chiamaka Ijeoma and Rayson, Paul and Ezeani, Ignatius and El-Haj, Mo and Asogwa, Doris Chinedu and Okpalla, Chidimma Lilian and Mbonu, Chinedu Emmanuel},
  booktitle = {ICLR 2023 Workshops: AfricaNLP},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/chukwuneke2023iclrw-igboner/}
}