Variational Autoencoders for Sparse and Overdispersed Discrete Data
Abstract
Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively.
Cite
Text
Zhao et al. "Variational Autoencoders for Sparse and Overdispersed Discrete Data." Artificial Intelligence and Statistics, 2020.Markdown
[Zhao et al. "Variational Autoencoders for Sparse and Overdispersed Discrete Data." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/zhao2020aistats-variational/)BibTeX
@inproceedings{zhao2020aistats-variational,
title = {{Variational Autoencoders for Sparse and Overdispersed Discrete Data}},
author = {Zhao, He and Rai, Piyush and Du, Lan and Buntine, Wray and Phung, Dinh and Zhou, Mingyuan},
booktitle = {Artificial Intelligence and Statistics},
year = {2020},
pages = {1684-1694},
volume = {108},
url = {https://mlanthology.org/aistats/2020/zhao2020aistats-variational/}
}