A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media

Abstract

Named entity recognition (NER) in Chinese social media is important but difficult because of its informality and strong noise. Previous methods only focus on in-domain supervised learning which is limited by the rare annotated data. However, there are enough corpora in formal domains and massive in-domain unannotated texts which can be used to improve the task. We propose a unified model which can learn from out-of-domain corpora and in-domain unannotated texts. The unified model contains two major functions. One is for cross-domain learning and another for semi-supervised learning. Cross-domain learning function can learn out-of-domain information based on domain similarity. Semi-Supervised learning function can learn in-domain unannotated information by self-training. Both learning functions outperform existing methods for NER in Chinese social media. Finally, our unified model yields nearly 11% absolute improvement over previously published results.

Cite

Text

He and Sun. "A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10977

Markdown

[He and Sun. "A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/he2017aaai-unified/) doi:10.1609/AAAI.V31I1.10977

BibTeX

@inproceedings{he2017aaai-unified,
  title     = {{A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media}},
  author    = {He, Hangfeng and Sun, Xu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {3216-3222},
  doi       = {10.1609/AAAI.V31I1.10977},
  url       = {https://mlanthology.org/aaai/2017/he2017aaai-unified/}
}