The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data

Abstract

This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster--Abstraction Model (CAM), is purely data driven and utilizes context-specific word occurrence statistics. In an intertwined fashion, the CAM extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. An annealed version of the Expectation--Maximization (EM) algorithm for maximum likelihood estimation of the model parameters is derived. The benefits of the CAM for interactive retrieval and automated cluster summarization are investigated experimentally. 1 Introduction Intelligent processing of text and documents ultimately has to be considered as a problem of natural language understanding. This paper presents a statistical approach to learning of language models for context--dependent word occurrences and discusses the applicability of this model for interactive informati...

Cite

Text

Hofmann. "The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data." International Joint Conference on Artificial Intelligence, 1999.

Markdown

[Hofmann. "The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/hofmann1999ijcai-cluster/)

BibTeX

@inproceedings{hofmann1999ijcai-cluster,
  title     = {{The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data}},
  author    = {Hofmann, Thomas},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {1999},
  pages     = {682-687},
  url       = {https://mlanthology.org/ijcai/1999/hofmann1999ijcai-cluster/}
}