Fully Sparse Topic Models
Abstract
In this paper, we propose Fully Sparse Topic Model (FSTM) for modeling large collections of documents. Three key properties of the model are: (1) the inference algorithm converges in linear time, (2) learning of topics is simply a multiplication of two sparse matrices, (3) it provides a principled way to directly trade off sparsity of solutions against inference quality and running time. These properties enable us to speedily learn sparse topics and to infer sparse latent representations of documents, and help significantly save memory for storage. We show that inference in FSTM is actually MAP inference with an implicit prior. Extensive experiments show that FSTM can perform substantially better than various existing topic models by different performance measures. Finally, our parallel implementation can handily learn thousands of topics from large corpora with millions of terms.
Cite
Text
Than and Ho. "Fully Sparse Topic Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33460-3_37Markdown
[Than and Ho. "Fully Sparse Topic Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/than2012ecmlpkdd-fully/) doi:10.1007/978-3-642-33460-3_37BibTeX
@inproceedings{than2012ecmlpkdd-fully,
title = {{Fully Sparse Topic Models}},
author = {Than, Khoat and Ho, Tu Bao},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2012},
pages = {490-505},
doi = {10.1007/978-3-642-33460-3_37},
url = {https://mlanthology.org/ecmlpkdd/2012/than2012ecmlpkdd-fully/}
}