Scalable Moment-Based Inference for Latent Dirichlet Allocation

Abstract

Topic models such as Latent Dirichlet Allocation have been useful text analysis methods of wide interest. Recently, moment-based inference with provable performance has been proposed for topic models. Compared with inference algorithms that approximate the maximum likelihood objective, moment-based inference has theoretical guarantee in recovering model parameters. One such inference method is tensor orthogonal decomposition, which requires only mild assumptions for exact recovery of topics. However, it suffers from scalability issue due to creation of dense, high-dimensional tensors. In this work, we propose a speedup technique by leveraging the special structure of the tensors. It is efficient in both time and space, and only requires scanning the corpus twice. It improves over the state-of-the-art inference algorithm by one to three orders of magnitude, while preserving equal inference ability.

Cite

Text

Wang et al. "Scalable Moment-Based Inference for Latent Dirichlet Allocation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014. doi:10.1007/978-3-662-44845-8_19

Markdown

[Wang et al. "Scalable Moment-Based Inference for Latent Dirichlet Allocation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014.](https://mlanthology.org/ecmlpkdd/2014/wang2014ecmlpkdd-scalable/) doi:10.1007/978-3-662-44845-8_19

BibTeX

@inproceedings{wang2014ecmlpkdd-scalable,
  title     = {{Scalable Moment-Based Inference for Latent Dirichlet Allocation}},
  author    = {Wang, Chi and Liu, Xueqing and Song, Yanglei and Han, Jiawei},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2014},
  pages     = {290-305},
  doi       = {10.1007/978-3-662-44845-8_19},
  url       = {https://mlanthology.org/ecmlpkdd/2014/wang2014ecmlpkdd-scalable/}
}