Hierarchical Latent Tree Analysis for Topic Detection
Abstract
In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word co- occurrence and co-occurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches.
Cite
Text
Liu et al. "Hierarchical Latent Tree Analysis for Topic Detection." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014. doi:10.1007/978-3-662-44851-9_17Markdown
[Liu et al. "Hierarchical Latent Tree Analysis for Topic Detection." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014.](https://mlanthology.org/ecmlpkdd/2014/liu2014ecmlpkdd-hierarchical/) doi:10.1007/978-3-662-44851-9_17BibTeX
@inproceedings{liu2014ecmlpkdd-hierarchical,
title = {{Hierarchical Latent Tree Analysis for Topic Detection}},
author = {Liu, Tengfei and Zhang, Nevin Lianwen and Chen, Peixian},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2014},
pages = {256-272},
doi = {10.1007/978-3-662-44851-9_17},
url = {https://mlanthology.org/ecmlpkdd/2014/liu2014ecmlpkdd-hierarchical/}
}