Bayesian Hierarchical Cross-Clustering

Abstract

Most clustering algorithms assume that all dimensions of the data can be described by a single structure. Cross-clustering (or multi-view clustering) allows multiple structures, each applying to a subset of the dimensions. We present a novel approach to cross-clustering, based on approximating the solution to a Cross Dirichlet Process mixture (CDPM) model [Shafto et al., 2006, Mansinghka et al., 2009]. Our bottom-up, deterministic approach results in a hierarchical clustering of dimensions, and at each node, a hierarchical clustering of data points. We also present a randomized approximation, based on a truncated hierarchy, that scales linearly in the number of levels. Results on synthetic and real-world data sets demonstrate that the cross-clustering based algorithms perform as well or better than the clustering based algorithms, our deterministic approaches models perform as well as the MCMC-based CDPM, and the randomized approximation provides a remarkable speedup relative to the full deterministic approximation with minimal cost in predictive error.

Cite

Text

Li and Shafto. "Bayesian Hierarchical Cross-Clustering." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.

Markdown

[Li and Shafto. "Bayesian Hierarchical Cross-Clustering." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.](https://mlanthology.org/aistats/2011/li2011aistats-bayesian/)

BibTeX

@inproceedings{li2011aistats-bayesian,
  title     = {{Bayesian Hierarchical Cross-Clustering}},
  author    = {Li, Dazhuo and Shafto, Patrick},
  booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  year      = {2011},
  pages     = {443-451},
  volume    = {15},
  url       = {https://mlanthology.org/aistats/2011/li2011aistats-bayesian/}
}