Fully Sparse Topic Models

Abstract

In this paper, we propose Fully Sparse Topic Model (FSTM) for modeling large collections of documents. Three key properties of the model are: (1) the inference algorithm converges in linear time, (2) learning of topics is simply a multiplication of two sparse matrices, (3) it provides a principled way to directly trade off sparsity of solutions against inference quality and running time. These properties enable us to speedily learn sparse topics and to infer sparse latent representations of documents, and help significantly save memory for storage. We show that inference in FSTM is actually MAP inference with an implicit prior. Extensive experiments show that FSTM can perform substantially better than various existing topic models by different performance measures. Finally, our parallel implementation can handily learn thousands of topics from large corpora with millions of terms.

Cite

Text

Than and Ho. "Fully Sparse Topic Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33460-3_37

Markdown

[Than and Ho. "Fully Sparse Topic Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/than2012ecmlpkdd-fully/) doi:10.1007/978-3-642-33460-3_37

BibTeX

@inproceedings{than2012ecmlpkdd-fully,
  title     = {{Fully Sparse Topic Models}},
  author    = {Than, Khoat and Ho, Tu Bao},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2012},
  pages     = {490-505},
  doi       = {10.1007/978-3-642-33460-3_37},
  url       = {https://mlanthology.org/ecmlpkdd/2012/than2012ecmlpkdd-fully/}
}