A Probabilistic Learning Method for XML Annotation of Documents

Abstract

We consider the problem of semantic annotation of semi-structured documents according to a target XML schema. The task is to annotate a document in a tree-like manner where the annotation tree is an instance of a tree class defined by DTD or W3C XML Schema descriptions. In the probabilistic setting, we cope with the tree annotation problem as a generalized probabilistic context-free parsing of an observation sequence where each observation comes with a probability distribution over terminals supplied by a probabilistic classifier associated with the content of documents. We determine the most probable tree annotation by maximizing the joint probability of selecting a terminal sequence for the observation sequence and the most probable parse for the selected terminal sequence. 1

Cite

Text

Chidlovskii and Fuselier. "A Probabilistic Learning Method for XML Annotation of Documents." International Joint Conference on Artificial Intelligence, 2005.

Markdown

[Chidlovskii and Fuselier. "A Probabilistic Learning Method for XML Annotation of Documents." International Joint Conference on Artificial Intelligence, 2005.](https://mlanthology.org/ijcai/2005/chidlovskii2005ijcai-probabilistic/)

BibTeX

@inproceedings{chidlovskii2005ijcai-probabilistic,
  title     = {{A Probabilistic Learning Method for XML Annotation of Documents}},
  author    = {Chidlovskii, Boris and Fuselier, Jérôme},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {1016-1021},
  url       = {https://mlanthology.org/ijcai/2005/chidlovskii2005ijcai-probabilistic/}
}