Relating Clustering Stability to Properties of Cluster Boundaries

Abstract

In this paper, we investigate stability-based methods for cluster model selection, in particular to select the number K of clusters. The scenario under consideration is that clustering is performed by minimizing a certain clustering quality function, and that a unique global minimizer exists. On the one hand we show that stability can be upper bounded by certain properties of the optimal clustering, namely by the mass in a small tube around the cluster boundaries. On the other hand, we provide counterexamples which show that a reverse statement is not true in general. Finally, we give some examples and arguments why, from a theoretic point of view, using clustering stability in a high sample setting can be problematic. It can be seen that distribution-free guarantees bounding the difference between the finite sample stability and the "true stability" cannot exist, unless one makes strong assumptions on the underlying distribution.

Cite

Text

Ben-David and von Luxburg. "Relating Clustering Stability to Properties of Cluster Boundaries." Annual Conference on Computational Learning Theory, 2008.

Markdown

[Ben-David and von Luxburg. "Relating Clustering Stability to Properties of Cluster Boundaries." Annual Conference on Computational Learning Theory, 2008.](https://mlanthology.org/colt/2008/bendavid2008colt-relating/)

BibTeX

@inproceedings{bendavid2008colt-relating,
  title     = {{Relating Clustering Stability to Properties of Cluster Boundaries}},
  author    = {Ben-David, Shai and von Luxburg, Ulrike},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2008},
  pages     = {379-390},
  url       = {https://mlanthology.org/colt/2008/bendavid2008colt-relating/}
}