The IBP Compound Dirichlet Process and Its Application to Focused Topic Modeling

Abstract

The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric mixed membership model---each data point is modeled with a collection of components of different proportions. Though powerful, the HDP makes an assumption that the probability of a component being exhibited by a data point is positively correlated with its proportion within that data point. This might be an undesirable assumption. For example, in topic modeling, a topic (component) might be rare throughout the corpus but dominant within those documents (data points) where it occurs. We develop the IBP compound Dirichlet process (ICD), a Bayesian nonparametric prior that decouples across-data prevalence and within-data proportion in a mixed membership model. The ICD combines properties from the HDP and the Indian buffet process (IBP), a Bayesian nonparametric prior on binary matrices. The ICD assigns a subset of the shared mixture components to each data point. This subset, the data point's ``focus'', is determined independently from the amount that each of its components contribute. We develop an ICD mixture model for text, the focused topic model (FTM), and show superior performance over the HDP-based topic model.

Cite

Text

Williamson et al. "The IBP Compound Dirichlet Process and Its Application to Focused Topic Modeling." International Conference on Machine Learning, 2010.

Markdown

[Williamson et al. "The IBP Compound Dirichlet Process and Its Application to Focused Topic Modeling." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/williamson2010icml-ibp/)

BibTeX

@inproceedings{williamson2010icml-ibp,
  title     = {{The IBP Compound Dirichlet Process and Its Application to Focused Topic Modeling}},
  author    = {Williamson, Sinead and Wang, Chong and Heller, Katherine A. and Blei, David M.},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {1151-1158},
  url       = {https://mlanthology.org/icml/2010/williamson2010icml-ibp/}
}