Encouraging Sparsity in Neural Topic Modeling with Non-Mean-Field Inference

Abstract

Topic modeling is a popular method for discovering semantic information from textual data, with latent Dirichlet allocation (LDA) being a representative model. Recently, researchers have explored the use of variational autoencoders (VAE) to improve the performance of LDA. However, there remain two major limitations: (1) the Dirichlet prior is inadequate to extract precise semantic information in VAE-LDA models, as it introduces a trade-off between the topic quality and the sparsity of representations; (2) new variants of VAE-LDA models with auxiliary variables generally ignore the correlation between latent variables in the inference process due to the Mean-Field assumption. To address these issues, in this paper, we propose a Sparsity Reinforced and Non-Mean-Field Topic Model ( SpareNTM ) with a bank of auxiliary Bernoulli variables in the generative process of LDA to further model the sparsity of document representations. Thus individual documents are forced to focus on a subset of topics by a corresponding Bernoulli topic selector. Then, instead of applying the mean-field assumption for the posterior approximation, we take full advantage of VAE to realize a non-mean-field approximation, which succeeds in preserving the connection of latent variables. Experiment results on three datasets (20NewsGroup, Wikitext-103, and SearchSnippets) show that our model outperforms recent topic models in terms of both topic quality and sparsity.

Cite

Text

Chen et al. "Encouraging Sparsity in Neural Topic Modeling with Non-Mean-Field Inference." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_9

Markdown

[Chen et al. "Encouraging Sparsity in Neural Topic Modeling with Non-Mean-Field Inference." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/chen2023ecmlpkdd-encouraging/) doi:10.1007/978-3-031-43421-1_9

BibTeX

@inproceedings{chen2023ecmlpkdd-encouraging,
  title     = {{Encouraging Sparsity in Neural Topic Modeling with Non-Mean-Field Inference}},
  author    = {Chen, Jiayao and Wang, Rui and He, Jueying and Li, Mark Junjie},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {142-158},
  doi       = {10.1007/978-3-031-43421-1_9},
  url       = {https://mlanthology.org/ecmlpkdd/2023/chen2023ecmlpkdd-encouraging/}
}