The Doubly Correlated Nonparametric Topic Model

Abstract

Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata.

Cite

Text

Kim and Sudderth. "The Doubly Correlated Nonparametric Topic Model." Neural Information Processing Systems, 2011.

Markdown

[Kim and Sudderth. "The Doubly Correlated Nonparametric Topic Model." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/kim2011neurips-doubly/)

BibTeX

@inproceedings{kim2011neurips-doubly,
  title     = {{The Doubly Correlated Nonparametric Topic Model}},
  author    = {Kim, Dae I. and Sudderth, Erik B.},
  booktitle = {Neural Information Processing Systems},
  year      = {2011},
  pages     = {1980-1988},
  url       = {https://mlanthology.org/neurips/2011/kim2011neurips-doubly/}
}