Resampling Approach for Cluster Model Selection

Abstract

In cluster analysis, selecting the number of clusters is an “ill-posed” problem of crucial importance. In this paper we propose a re-sampling method for assessing cluster stability. Our model suggests that samples’ occurrences in clusters can be considered as realizations of the same random variable in the case of the “true” number of clusters. Thus, similarity between different cluster solutions is measured by means of compound and simple probability metrics. Compound criteria result in validation rules employing the stability content of clusters. Simple probability metrics, in particular those based on kernels, provide more flexible geometrical criteria. We analyze several applications of probability metrics combined with methods intended to simulate cluster occurrences. Numerical experiments are provided to demonstrate and compare the different metrics and simulation approaches.

Cite

Text

Volkovich et al. "Resampling Approach for Cluster Model Selection." Machine Learning, 2011. doi:10.1007/S10994-011-5236-9

Markdown

[Volkovich et al. "Resampling Approach for Cluster Model Selection." Machine Learning, 2011.](https://mlanthology.org/mlj/2011/volkovich2011mlj-resampling/) doi:10.1007/S10994-011-5236-9

BibTeX

@article{volkovich2011mlj-resampling,
  title     = {{Resampling Approach for Cluster Model Selection}},
  author    = {Volkovich, Zeev and Barzily, Zeev and Weber, Gerhard-Wilhelm and Toledano-Kitai, Dvora and Avros, Renata},
  journal   = {Machine Learning},
  year      = {2011},
  pages     = {209-248},
  doi       = {10.1007/S10994-011-5236-9},
  volume    = {85},
  url       = {https://mlanthology.org/mlj/2011/volkovich2011mlj-resampling/}
}