Collective Information Extraction with Context-Specific Consistencies

Abstract

Conditional Random Fields (CRFs) have been widely used for information extraction from free texts as well as from semi-structured documents. Interesting entities in semi-structured domains are often consistently structured within a certain context or document. However, their actual compositions vary and are possibly inconsistent among different contexts. We present two collective information extraction approaches based on CRFs for exploiting these context-specific consistencies. The first approach extends linear-chain CRFs by additional factors specified by a classifier, which learns such consistencies during inference. In a second extended approach, we propose a variant of skip-chain CRFs, which enables the model to transfer long-range evidence about the consistency of the entities. The practical relevance of the presented work for real-world information extraction systems is highlighted in an empirical study. Both approaches achieve a considerable error reduction.

Cite

Text

Klügl et al. "Collective Information Extraction with Context-Specific Consistencies." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33460-3_52

Markdown

[Klügl et al. "Collective Information Extraction with Context-Specific Consistencies." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/klugl2012ecmlpkdd-collective/) doi:10.1007/978-3-642-33460-3_52

BibTeX

@inproceedings{klugl2012ecmlpkdd-collective,
  title     = {{Collective Information Extraction with Context-Specific Consistencies}},
  author    = {Klügl, Peter and Toepfer, Martin and Lemmerich, Florian and Hotho, Andreas and Puppe, Frank},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2012},
  pages     = {728-743},
  doi       = {10.1007/978-3-642-33460-3_52},
  url       = {https://mlanthology.org/ecmlpkdd/2012/klugl2012ecmlpkdd-collective/}
}