Precision-Recall Balanced Topic Modelling

Abstract

Topic models are becoming increasingly relevant probabilistic models for dimensionality reduction of text data, inferring topics that capture meaningful themes of frequently co-occurring terms. We formulate topic modelling as an information retrieval task, where the goal is, based on the latent topic representation, to capture relevant term co-occurrence patterns. We evaluate performance for this task rigorously with regard to two types of errors, false negatives and positives, based on the well-known precision-recall trade-off and provide a statistical model that allows the user to balance between the contributions of the different error types. When the user focuses solely on the contribution of false negatives ignoring false positives altogether our proposed model reduces to a standard topic model. Extensive experiments demonstrate the proposed approach is effective and infers more coherent topics than existing related approaches.

Cite

Text

Virtanen and Girolami. "Precision-Recall Balanced Topic Modelling." Neural Information Processing Systems, 2019.

Markdown

[Virtanen and Girolami. "Precision-Recall Balanced Topic Modelling." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/virtanen2019neurips-precisionrecall/)

BibTeX

@inproceedings{virtanen2019neurips-precisionrecall,
  title     = {{Precision-Recall Balanced Topic Modelling}},
  author    = {Virtanen, Seppo and Girolami, Mark},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {6750-6759},
  url       = {https://mlanthology.org/neurips/2019/virtanen2019neurips-precisionrecall/}
}