A Probabilistic Model of Redundancy in Information Extraction

Downey, Doug; Etzioni, Oren; Soderland, Stephen

doi:10.21236/ada454763

A Probabilistic Model of Redundancy in Information Extraction

Doug Downey, Oren Etzioni, Stephen Soderland

IJCAI 2005 pp. 1034-1041

doi:10.21236/ada454763 /ijcai/2005/downey2005ijcai-probabilistic/

Abstract

Unsupervised Information Extraction (UIE) is the task of extracting knowledge from text without using hand-tagged training examples. A fundamental problem for both UIE and supervised IE is assessing the probability that extracted information is correct. In massive corpora such as the Web, the same extraction is found repeatedly in different documents. How does this redundancy impact the probability of correctness? This paper introduces a combinatorial balls-andurns model that computes the impact of sample size, redundancy, and corroboration from multiple distinct extraction rules on the probability that an extraction is correct. We describe methods for estimating the model's parameters in practice and demonstrate experimentally that for UIE the model's log likelihoods are 15 times better, on average, than those obtained by Pointwise Mutual Information (PMI) and the noisy-or model used in previous work. For supervised IE, the model's performance is comparable to that of Support Vector Machines, and Logistic Regression.

PDF Semantic Scholar

Cite

Text

Downey et al. "A Probabilistic Model of Redundancy in Information Extraction." International Joint Conference on Artificial Intelligence, 2005. doi:10.21236/ada454763

Markdown

[Downey et al. "A Probabilistic Model of Redundancy in Information Extraction." International Joint Conference on Artificial Intelligence, 2005.](https://mlanthology.org/ijcai/2005/downey2005ijcai-probabilistic/) doi:10.21236/ada454763

BibTeX

@inproceedings{downey2005ijcai-probabilistic,
  title     = {{A Probabilistic Model of Redundancy in Information Extraction}},
  author    = {Downey, Doug and Etzioni, Oren and Soderland, Stephen},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {1034-1041},
  doi       = {10.21236/ada454763},
  url       = {https://mlanthology.org/ijcai/2005/downey2005ijcai-probabilistic/}
}