Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER
Abstract
BiLSTM has been prevalently used as a core module for NER in a sequence-labeling setup. State-of-the-art approaches use BiLSTM with additional resources such as gazetteers, language-modeling, or multi-task supervision to further improve NER. This paper instead takes a step back and focuses on analyzing problems of BiLSTM itself and how exactly self-attention can bring improvements. We formally show the limitation of (CRF-)BiLSTM in modeling cross-context patterns for each word – the XOR limitation. Then, we show that two types of simple cross-structures – self-attention and Cross-BiLSTM – can effectively remedy the problem. We test the practical impacts of the deficiency on real-world NER datasets, OntoNotes 5.0 and WNUT 2017, with clear and consistent improvements over the baseline, up to 8.7% on some of the multi-token entity mentions. We give in-depth analyses of the improvements across several aspects of NER, especially the identification of multi-token mentions. This study should lay a sound foundation for future improvements on sequence-labeling NER1.
Cite
Text
Li et al. "Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I05.6338Markdown
[Li et al. "Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/li2020aaai-attention/) doi:10.1609/AAAI.V34I05.6338BibTeX
@inproceedings{li2020aaai-attention,
title = {{Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER}},
author = {Li, Peng-Hsuan and Fu, Tsu-Jui and Ma, Wei-Yun},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2020},
pages = {8236-8244},
doi = {10.1609/AAAI.V34I05.6338},
url = {https://mlanthology.org/aaai/2020/li2020aaai-attention/}
}