Document Informed Neural Autoregressive Topic Models with Distributional Prior

Abstract

We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., “networks” used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.

Cite

Text

Gupta et al. "Document Informed Neural Autoregressive Topic Models with Distributional Prior." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33016505

Markdown

[Gupta et al. "Document Informed Neural Autoregressive Topic Models with Distributional Prior." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/gupta2019aaai-document/) doi:10.1609/AAAI.V33I01.33016505

BibTeX

@inproceedings{gupta2019aaai-document,
  title     = {{Document Informed Neural Autoregressive Topic Models with Distributional Prior}},
  author    = {Gupta, Pankaj and Chaudhary, Yatin and Buettner, Florian and Schütze, Hinrich},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {6505-6512},
  doi       = {10.1609/AAAI.V33I01.33016505},
  url       = {https://mlanthology.org/aaai/2019/gupta2019aaai-document/}
}