The Doubly Correlated Nonparametric Topic Model
Abstract
Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata.
Cite
Text
Kim and Sudderth. "The Doubly Correlated Nonparametric Topic Model." Neural Information Processing Systems, 2011.Markdown
[Kim and Sudderth. "The Doubly Correlated Nonparametric Topic Model." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/kim2011neurips-doubly/)BibTeX
@inproceedings{kim2011neurips-doubly,
title = {{The Doubly Correlated Nonparametric Topic Model}},
author = {Kim, Dae I. and Sudderth, Erik B.},
booktitle = {Neural Information Processing Systems},
year = {2011},
pages = {1980-1988},
url = {https://mlanthology.org/neurips/2011/kim2011neurips-doubly/}
}