Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Abstract
We propose the hierarchical Dirichlet process (HDP), a nonparametric Bayesian model for clustering problems involving multiple groups of data. Each group of data is modeled with a mixture, with the number of components being open-ended and inferred automatically by the model. Further, components can be shared across groups, allowing dependencies across groups to be modeled effectively as well as conferring generaliza- tion to new groups. Such grouped clustering problems occur often in practice, e.g. in the problem of topic discovery in document corpora. We report experimental results on three text corpora showing the effective and superior performance of the HDP over previous models.
Cite
Text
Teh et al. "Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes." Neural Information Processing Systems, 2004.Markdown
[Teh et al. "Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/teh2004neurips-sharing/)BibTeX
@inproceedings{teh2004neurips-sharing,
title = {{Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes}},
author = {Teh, Yee W. and Jordan, Michael I. and Beal, Matthew J. and Blei, David M.},
booktitle = {Neural Information Processing Systems},
year = {2004},
pages = {1385-1392},
url = {https://mlanthology.org/neurips/2004/teh2004neurips-sharing/}
}