Using Vocabulary Knowledge in Bayesian Multinomial Estimation

Abstract

Estimating the parameters of sparse multinomial distributions is an important component of many statistical learning tasks. Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compres(cid:173) sion and estimating distributions over words in newsgroup data.

Cite

Text

Griffiths and Tenenbaum. "Using Vocabulary Knowledge in Bayesian Multinomial Estimation." Neural Information Processing Systems, 2001.

Markdown

[Griffiths and Tenenbaum. "Using Vocabulary Knowledge in Bayesian Multinomial Estimation." Neural Information Processing Systems, 2001.](https://mlanthology.org/neurips/2001/griffiths2001neurips-using/)

BibTeX

@inproceedings{griffiths2001neurips-using,
  title     = {{Using Vocabulary Knowledge in Bayesian Multinomial Estimation}},
  author    = {Griffiths, Thomas L. and Tenenbaum, Joshua B.},
  booktitle = {Neural Information Processing Systems},
  year      = {2001},
  pages     = {1385-1392},
  url       = {https://mlanthology.org/neurips/2001/griffiths2001neurips-using/}
}