Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment

Abstract

The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.

Cite

Text

Chuang et al. "Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment." International Conference on Machine Learning, 2013.

Markdown

[Chuang et al. "Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment." International Conference on Machine Learning, 2013.](https://mlanthology.org/icml/2013/chuang2013icml-topic/)

BibTeX

@inproceedings{chuang2013icml-topic,
  title     = {{Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment}},
  author    = {Chuang, Jason and Gupta, Sonal and Manning, Christopher and Heer, Jeffrey},
  booktitle = {International Conference on Machine Learning},
  year      = {2013},
  pages     = {612-620},
  volume    = {28},
  url       = {https://mlanthology.org/icml/2013/chuang2013icml-topic/}
}