Lifelong Hierarchical Topic Modeling via Nonparametric Word Embedding Clustering

Abstract

Hierarchical topic models that can mine topics representing latent semantics and organize these topics into hierarchies have been widely developed. However, the existing methods often assume a fixed topic hierarchy, leading to poor performance when applied to document streams. Meanwhile, the prior knowledge of topic structure is helpful for hierarchical topic modeling but it is quite costly to obtain such information manually. To address these issues, we propose a lifelong hierarchical topic model to automatically learn flexible topic structure by nonparametric word embedding clustering. Besides, we design a knowledge base in the form of word hierarchies that serves as automatically-extracted prior knowledge to support the topic structure generation. Furthermore, we update the knowledge base by accumulating structure information from the past. Experiments on real-world datasets demonstrate that our method can generate a rational, flexible, and coherent topic structure. Lifelong learning evaluations also validate that our method is less influenced by catastrophic forgetting than baseline models. Our code is available at https://github.com/yjx5050ptol/LNCHTM .

Cite

Text

Yan et al. "Lifelong Hierarchical Topic Modeling via Nonparametric Word Embedding Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70371-3_16

Markdown

[Yan et al. "Lifelong Hierarchical Topic Modeling via Nonparametric Word Embedding Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/yan2024ecmlpkdd-lifelong/) doi:10.1007/978-3-031-70371-3_16

BibTeX

@inproceedings{yan2024ecmlpkdd-lifelong,
  title     = {{Lifelong Hierarchical Topic Modeling via Nonparametric Word Embedding Clustering}},
  author    = {Yan, Jiaxing and Lu, Yuyin and Chen, Hegang and Yu, Jianxing and Rao, Yanghui},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {270-287},
  doi       = {10.1007/978-3-031-70371-3_16},
  url       = {https://mlanthology.org/ecmlpkdd/2024/yan2024ecmlpkdd-lifelong/}
}