A New Mallows Distance Based Metric for Comparing Clusterings
Abstract
Despite of the large number of algorithms developed for clustering, the study on comparing clustering results is limited. In this paper, we propose a measure for comparing clustering results to tackle two issues insufficiently addressed or even overlooked by existing methods: (a) taking into account the distance between cluster representatives when assessing the similarity of clustering results; (b) constructing a unified framework for defining a distance based on either hard or soft clustering and ensuring the triangle inequality under the definition. Our measure is derived from a complete and globally optimal matching between clusters in two clustering results. It is shown that the distance is an instance of the Mallows distance---a metric between probability distributions in statistics. As a result, the defined distance inherits desirable properties from the Mallows distance. Experiments show that our clustering distance measure successfully handles cases difficult for other measures.
Cite
Text
Zhou et al. "A New Mallows Distance Based Metric for Comparing Clusterings." International Conference on Machine Learning, 2005. doi:10.1145/1102351.1102481Markdown
[Zhou et al. "A New Mallows Distance Based Metric for Comparing Clusterings." International Conference on Machine Learning, 2005.](https://mlanthology.org/icml/2005/zhou2005icml-new/) doi:10.1145/1102351.1102481BibTeX
@inproceedings{zhou2005icml-new,
title = {{A New Mallows Distance Based Metric for Comparing Clusterings}},
author = {Zhou, Ding and Li, Jia and Zha, Hongyuan},
booktitle = {International Conference on Machine Learning},
year = {2005},
pages = {1028-1035},
doi = {10.1145/1102351.1102481},
url = {https://mlanthology.org/icml/2005/zhou2005icml-new/}
}