Locating Complex Named Entities in Web Text

Downey, Doug; Broadhead, Matthew; Etzioni, Oren

Locating Complex Named Entities in Web Text

Doug Downey, Matthew Broadhead, Oren Etzioni

IJCAI 2007 pp. 2733-2739

/ijcai/2007/downey2007ijcai-locating/

Abstract

Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of pre-defined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a wide variety of entity classes, which are not known in advance. Thus, hand-tagging examples of each entity class is impractical. This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text. Our key observation is that named entities can be viewed as a species of multi-word units, which can be detected by accumulating n-gram statistics over the Web corpus. We show that this statistical method's F1 score is 50% higher than that of supervised techniques including Conditional Random Fields (CRFs) and Conditional Markov Models (CMMs) when applied to complex names. The method also outperforms CMMs and CRFs by 117% on entity classes absent from the training data. Finally, our method outperforms a semi-supervised CRF by 73%. URL: http://www.cs.washington.edu/homes/ddowney/papers/ddowneyijcai2007_lex.pdf

PDF Semantic Scholar

Cite

Text

Downey et al. "Locating Complex Named Entities in Web Text." International Joint Conference on Artificial Intelligence, 2007.

Markdown

[Downey et al. "Locating Complex Named Entities in Web Text." International Joint Conference on Artificial Intelligence, 2007.](https://mlanthology.org/ijcai/2007/downey2007ijcai-locating/)

BibTeX

@inproceedings{downey2007ijcai-locating,
  title     = {{Locating Complex Named Entities in Web Text}},
  author    = {Downey, Doug and Broadhead, Matthew and Etzioni, Oren},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2007},
  pages     = {2733-2739},
  url       = {https://mlanthology.org/ijcai/2007/downey2007ijcai-locating/}
}