Parsing a Natural Language Using Mutual Information Statistics

Abstract

The purpose of this paper is to characterize a constituent boundary parsing algorithm, using an information-theoretic measure called generalized mutual information, which serves as an alternative to traditional grammar-based parsing methods. This method is based on the hypothesis that constituent boundaries can be extracted from a given sentence (or word sequence) by analyzing the mutual information values of the part-ofspeech n-grams within the sentence. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines a recursive unlabeled bracketing of unrestricted English text with a relatively low error rate. This paper derives the generalized mutual information statistic, describes the parsing algorithm, and presents results and sample output from the parser. Introduction A standard approach to parsing a natural language is to characterize the language using a set of rules, a grammar. A grammar-based parsing algori...

Cite

Text

Magerman and Marcus. "Parsing a Natural Language Using Mutual Information Statistics." AAAI Conference on Artificial Intelligence, 1990.

Markdown

[Magerman and Marcus. "Parsing a Natural Language Using Mutual Information Statistics." AAAI Conference on Artificial Intelligence, 1990.](https://mlanthology.org/aaai/1990/magerman1990aaai-parsing/)

BibTeX

@inproceedings{magerman1990aaai-parsing,
  title     = {{Parsing a Natural Language Using Mutual Information Statistics}},
  author    = {Magerman, David M. and Marcus, Mitchell P.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1990},
  pages     = {984-989},
  url       = {https://mlanthology.org/aaai/1990/magerman1990aaai-parsing/}
}