Towards a Simple Clustering Criterion Based on Minimum Length Encoding
Abstract
We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.
Cite
Text
Ludl and Widmer. "Towards a Simple Clustering Criterion Based on Minimum Length Encoding." European Conference on Machine Learning, 2002. doi:10.1007/3-540-36755-1_22Markdown
[Ludl and Widmer. "Towards a Simple Clustering Criterion Based on Minimum Length Encoding." European Conference on Machine Learning, 2002.](https://mlanthology.org/ecmlpkdd/2002/ludl2002ecml-simple/) doi:10.1007/3-540-36755-1_22BibTeX
@inproceedings{ludl2002ecml-simple,
title = {{Towards a Simple Clustering Criterion Based on Minimum Length Encoding}},
author = {Ludl, Marcus-Christopher and Widmer, Gerhard},
booktitle = {European Conference on Machine Learning},
year = {2002},
pages = {258-269},
doi = {10.1007/3-540-36755-1_22},
url = {https://mlanthology.org/ecmlpkdd/2002/ludl2002ecml-simple/}
}