Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing

Abstract

Although state-of-the-art parsers for natural language are lexicalized, it was recently shown that an accurate unlexicalized parser for the Penn tree-bank can be simply read off a manually refined tree-bank. While lexicalized parsers often suffer from sparse data, manual mark-up is costly and largely based on individual linguistic intuition. Thus, across domains, languages, and tree-bank annotations, a fundamental question arises: Is it possible to automatically induce an accurate parser from a tree-bank without resorting to full lexicalization? In this paper, we show how to induce a probabilistic parser with latent head information from simple linguistic principles. Our parser has a performance of 85.1% (LP/LR F_1), which is as good as that of early lexicalized ones. This is remarkable since the induction of probabilistic grammars is in general a hard task.

Cite

Text

Prescher. "Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing." European Conference on Machine Learning, 2005. doi:10.1007/11564096_30

Markdown

[Prescher. "Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing." European Conference on Machine Learning, 2005.](https://mlanthology.org/ecmlpkdd/2005/prescher2005ecml-inducing/) doi:10.1007/11564096_30

BibTeX

@inproceedings{prescher2005ecml-inducing,
  title     = {{Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing}},
  author    = {Prescher, Detlef},
  booktitle = {European Conference on Machine Learning},
  year      = {2005},
  pages     = {292-304},
  doi       = {10.1007/11564096_30},
  url       = {https://mlanthology.org/ecmlpkdd/2005/prescher2005ecml-inducing/}
}