Latent Dirichlet Allocation
Abstract
We propose a generative model for text and other collections of dis(cid:173) crete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof(cid:173) mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present em(cid:173) pirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.
Cite
Text
Blei et al. "Latent Dirichlet Allocation." Neural Information Processing Systems, 2001.Markdown
[Blei et al. "Latent Dirichlet Allocation." Neural Information Processing Systems, 2001.](https://mlanthology.org/neurips/2001/blei2001neurips-latent/)BibTeX
@inproceedings{blei2001neurips-latent,
title = {{Latent Dirichlet Allocation}},
author = {Blei, David M. and Ng, Andrew Y. and Jordan, Michael I.},
booktitle = {Neural Information Processing Systems},
year = {2001},
pages = {601-608},
url = {https://mlanthology.org/neurips/2001/blei2001neurips-latent/}
}