Interpolating Between Types and Tokens by Estimating Power-Law Generators

Abstract

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting stan- dard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

Cite

Text

Goldwater et al. "Interpolating Between Types and Tokens by Estimating Power-Law Generators." Neural Information Processing Systems, 2005.

Markdown

[Goldwater et al. "Interpolating Between Types and Tokens by Estimating Power-Law Generators." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/goldwater2005neurips-interpolating/)

BibTeX

@inproceedings{goldwater2005neurips-interpolating,
  title     = {{Interpolating Between Types and Tokens by Estimating Power-Law Generators}},
  author    = {Goldwater, Sharon and Johnson, Mark and Griffiths, Thomas L.},
  booktitle = {Neural Information Processing Systems},
  year      = {2005},
  pages     = {459-466},
  url       = {https://mlanthology.org/neurips/2005/goldwater2005neurips-interpolating/}
}