Towards a Simple Clustering Criterion Based on Minimum Length Encoding

Abstract

We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.

Cite

Text

Ludl and Widmer. "Towards a Simple Clustering Criterion Based on Minimum Length Encoding." European Conference on Machine Learning, 2002. doi:10.1007/3-540-36755-1_22

Markdown

[Ludl and Widmer. "Towards a Simple Clustering Criterion Based on Minimum Length Encoding." European Conference on Machine Learning, 2002.](https://mlanthology.org/ecmlpkdd/2002/ludl2002ecml-simple/) doi:10.1007/3-540-36755-1_22

BibTeX

@inproceedings{ludl2002ecml-simple,
  title     = {{Towards a Simple Clustering Criterion Based on Minimum Length Encoding}},
  author    = {Ludl, Marcus-Christopher and Widmer, Gerhard},
  booktitle = {European Conference on Machine Learning},
  year      = {2002},
  pages     = {258-269},
  doi       = {10.1007/3-540-36755-1_22},
  url       = {https://mlanthology.org/ecmlpkdd/2002/ludl2002ecml-simple/}
}