Scalable Nonparametric Bayesian Multilevel Clustering
Abstract
Multilevel clustering problems where the content and contextual information are jointly clustered are ubiquitous in modern data sets. Existing work on this problem are limited to small datasets due to the use of the Gibbs sampler. We address the problem of scaling up multilevel clustering under a Bayesian nonparametric setting, extending the MC2 model proposed in (Nguyen et al., 2014). We ground our approach in mean-field and stochastic variational inference (SVI) theory. However, the interplay between content and context modeling makes naive mean-field approach inefficient. We develop a tree-structured SVI algorithm that avoids the need to repeatedly go through the corpus as in Gibbs sampler. More crucially, our method is immediately amendable to parallelization, facilitating a scalable distributed implementation of our algorithm on the Apache Spark platform. We conducted extensive experiments in a variety of domains including text, images, and real-world user application activities. Direct comparison with the Gibbs-sampler demonstrates that our method is an order-of-magnitude faster without loss of model quality. Our Spark-based implementation gains another order-of-magnitude speed up and can scale to large real-world data sets containing millions of documents and groups.
Cite
Text
Huynh et al. "Scalable Nonparametric Bayesian Multilevel Clustering." Conference on Uncertainty in Artificial Intelligence, 2016.Markdown
[Huynh et al. "Scalable Nonparametric Bayesian Multilevel Clustering." Conference on Uncertainty in Artificial Intelligence, 2016.](https://mlanthology.org/uai/2016/huynh2016uai-scalable/)BibTeX
@inproceedings{huynh2016uai-scalable,
title = {{Scalable Nonparametric Bayesian Multilevel Clustering}},
author = {Huynh, Viet and Phung, Dinh Q. and Venkatesh, Svetha and Nguyen, XuanLong and Hoffman, Matthew D. and Bui, Hung Hai},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2016},
url = {https://mlanthology.org/uai/2016/huynh2016uai-scalable/}
}