Leveraging Multi-Token Entities in Document-Level Named Entity Recognition

Anwen Hu, Zhicheng Dou, Jian-Yun Nie, Ji-Rong Wen

AAAI 2020 pp. 7961-7968

doi:10.1609/AAAI.V34I05.6304 /aaai/2020/hu2020aaai-leveraging/

Abstract

Most state-of-the-art named entity recognition systems are designed to process each sentence within a document independently. These systems are easy to confuse entity types when the context information in a sentence is not sufficient enough. To utilize the context information within the whole document, most document-level work let neural networks on their own to learn the relation across sentences, which is not intuitive enough for us humans. In this paper, we divide entities to multi-token entities that contain multiple tokens and single-token entities that are composed of a single token. We propose that the context information of multi-token entities should be more reliable in document-level NER for news articles. We design a fusion attention mechanism which not only learns the semantic relevance between occurrences of the same token, but also focuses more on occurrences belonging to multi-tokens entities. To identify multi-token entities, we design an auxiliary task namely ‘Multi-token Entity Classification’ and perform this task simultaneously with document-level NER. This auxiliary task is simplified from NER and doesn't require extra annotation. Experimental results on the CoNLL-2003 dataset and OntoNotesnbm dataset show that our model outperforms state-of-the-art sentence-level and document-level NER methods.

PDF AAAI Semantic Scholar

Cite

Text

Hu et al. "Leveraging Multi-Token Entities in Document-Level Named Entity Recognition." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I05.6304

Markdown

[Hu et al. "Leveraging Multi-Token Entities in Document-Level Named Entity Recognition." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/hu2020aaai-leveraging/) doi:10.1609/AAAI.V34I05.6304

BibTeX

@inproceedings{hu2020aaai-leveraging,
  title     = {{Leveraging Multi-Token Entities in Document-Level Named Entity Recognition}},
  author    = {Hu, Anwen and Dou, Zhicheng and Nie, Jian-Yun and Wen, Ji-Rong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {7961-7968},
  doi       = {10.1609/AAAI.V34I05.6304},
  url       = {https://mlanthology.org/aaai/2020/hu2020aaai-leveraging/}
}