Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models
Abstract
Mixture of exponential family models are among the most fundamental and widely used statistical models. Stochastic variational inference (SVI), the state-of-the-art algorithm for parameter estimation in such models is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this poses serious limitations on scalability when the number of parameters is in billions. In this paper, we present extreme stochastic variational inference (ESVI), a distributed, asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets. ESVI overcomes the limitations of SVI by requiring that each processor only access a subset of the data and a subset of the parameters, thus providing data and model parallelism simultaneously. Our empirical study demonstrates that ESVI not only outperforms VI and SVI in wallclock-time, but also achieves a better quality solution. To further speed up computation and save memory when fitting large number of topics, we propose a variant ESVI-TOPK which maintains only the top-k important topics. Empirically, we found that using top 25% topics suffices to achieve the same accuracy as storing all the topics.
Cite
Text
Zhang et al. "Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models." Artificial Intelligence and Statistics, 2019.Markdown
[Zhang et al. "Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/zhang2019aistats-extreme/)BibTeX
@inproceedings{zhang2019aistats-extreme,
title = {{Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models}},
author = {Zhang, Jiong and Raman, Parameswaran and Ji, Shihao and Yu, Hsiang-Fu and Vishwanathan, S.V.N. and Dhillon, Inderjit},
booktitle = {Artificial Intelligence and Statistics},
year = {2019},
pages = {935-943},
volume = {89},
url = {https://mlanthology.org/aistats/2019/zhang2019aistats-extreme/}
}