Interpolating Between Types and Tokens by Estimating Power-Law Generators
Abstract
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting stan- dard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
Cite
Text
Goldwater et al. "Interpolating Between Types and Tokens by Estimating Power-Law Generators." Neural Information Processing Systems, 2005.Markdown
[Goldwater et al. "Interpolating Between Types and Tokens by Estimating Power-Law Generators." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/goldwater2005neurips-interpolating/)BibTeX
@inproceedings{goldwater2005neurips-interpolating,
title = {{Interpolating Between Types and Tokens by Estimating Power-Law Generators}},
author = {Goldwater, Sharon and Johnson, Mark and Griffiths, Thomas L.},
booktitle = {Neural Information Processing Systems},
year = {2005},
pages = {459-466},
url = {https://mlanthology.org/neurips/2005/goldwater2005neurips-interpolating/}
}