Probabilistic Size-Constrained Microclustering

Abstract

Microclustering refers to clustering models that produce small clusters or, equivalently, to models where the size of the clusters grows sublinearly with the number of samples. We formulate probabilistic microclustering models by assigning a prior distribution on the size of the clusters, and in particular consider microclustering models with explicit bounds on the size of the clusters. The combinatorial constraints make full Bayesian inference complicated, but we manage to develop a Gibbs sampling algorithm that can efficiently sample from the joint cluster allocation of all data points. We empirically demonstrate the computational efficiency of the algorithm for problem instances of varying difficulty.

Cite

Text

Klami and Jitta. "Probabilistic Size-Constrained Microclustering." Conference on Uncertainty in Artificial Intelligence, 2016.

Markdown

[Klami and Jitta. "Probabilistic Size-Constrained Microclustering." Conference on Uncertainty in Artificial Intelligence, 2016.](https://mlanthology.org/uai/2016/klami2016uai-probabilistic/)

BibTeX

@inproceedings{klami2016uai-probabilistic,
  title     = {{Probabilistic Size-Constrained Microclustering}},
  author    = {Klami, Arto and Jitta, Aditya},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2016},
  url       = {https://mlanthology.org/uai/2016/klami2016uai-probabilistic/}
}