Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling
Abstract
We propose a nonparametric Bayesian mixture model that simultaneously optimizes the topic extraction and group clustering while allowing all topics to be shared by all clusters for grouped data. In addition, in order to enhance the computational efficiency on par with today’s large-scale data, we formulate our model so that it can use a closed-form variational Bayesian method to approximately calculate the posterior distribution. Experimental results with corpus data show that our model has a better performance than existing models, achieving a 22 % improvement against state-of-the-art model. Moreover, an experiment with location data from mobile phones shows that our model performs well in the field of big data analysis.
Cite
Text
Shimosaka et al. "Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016. doi:10.1007/978-3-319-46227-1_15Markdown
[Shimosaka et al. "Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.](https://mlanthology.org/ecmlpkdd/2016/shimosaka2016ecmlpkdd-coupled/) doi:10.1007/978-3-319-46227-1_15BibTeX
@inproceedings{shimosaka2016ecmlpkdd-coupled,
title = {{Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling}},
author = {Shimosaka, Masamichi and Tsukiji, Takeshi and Tominaga, Shoji and Tsubouchi, Kota},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2016},
pages = {230-246},
doi = {10.1007/978-3-319-46227-1_15},
url = {https://mlanthology.org/ecmlpkdd/2016/shimosaka2016ecmlpkdd-coupled/}
}