Using an Annotated Language Corpus as a Virtual Stochastic Grammar

Abstract

In Data Oriented Parsing (DOP), an annotated language corpus is used as a virtual stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrees. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bod, 1992) provides a formalization of the theory. In this paper we show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) corpus. Preliminary experiments yield 96% test set parsing accuracy.

Cite

Text

Bod. "Using an Annotated Language Corpus as a Virtual Stochastic Grammar." AAAI Conference on Artificial Intelligence, 1993.

Markdown

[Bod. "Using an Annotated Language Corpus as a Virtual Stochastic Grammar." AAAI Conference on Artificial Intelligence, 1993.](https://mlanthology.org/aaai/1993/bod1993aaai-using/)

BibTeX

@inproceedings{bod1993aaai-using,
  title     = {{Using an Annotated Language Corpus as a Virtual Stochastic Grammar}},
  author    = {Bod, Rens},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1993},
  pages     = {778-783},
  url       = {https://mlanthology.org/aaai/1993/bod1993aaai-using/}
}