A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis

Abstract

Computing the probability of unseen documents is a natural evaluation task in topic modeling. Previous work has addressed this problem for the well-known Latent Dirichlet Allocation (LDA) model. However, the same problem for a more general class of topic models, referred here to as Gamma-Poisson Factor Analysis (GaP-FA), remains unexplored, which hampers a fair comparison between models. Recent findings on the exact marginal likelihood of GaP-FA enable the derivation of a closed-form expression. In this paper, we show that its exact computation grows exponentially with the number of topics and non-zero words in a document, thus being only solvable for relatively small models and short documents. Experimentation in various corpus also indicates that existing methods in the literature are unlikely to accurately estimate this probability. With that in mind, we propose L2R, a left-to-right sequential sampler that decomposes the document probability into a product of conditionals and estimates them separately. We then proceed by confirming that our estimator converges and is unbiased for both small and large collections. Code related to this paper is available at: https://github.com/jcapde/L2R , https://doi.org/10.7910/DVN/GDTAAC .

Cite

Text

Capdevila et al. "A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2018. doi:10.1007/978-3-030-10928-8_38

Markdown

[Capdevila et al. "A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2018.](https://mlanthology.org/ecmlpkdd/2018/capdevila2018ecmlpkdd-lefttoright/) doi:10.1007/978-3-030-10928-8_38

BibTeX

@inproceedings{capdevila2018ecmlpkdd-lefttoright,
  title     = {{A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis}},
  author    = {Capdevila, Joan and Cerquides, Jesús and Torres, Jordi and Petitjean, François and Buntine, Wray L.},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2018},
  pages     = {638-654},
  doi       = {10.1007/978-3-030-10928-8_38},
  url       = {https://mlanthology.org/ecmlpkdd/2018/capdevila2018ecmlpkdd-lefttoright/}
}