Refining Initial Points for K-Means Clustering
Abstract
Practical approaches to clustering use an iterative procedure (e.g. K-Means, EM) which converges to one of numerous local minima. It is known that these iterative techniques are especially sensitive to initial starting conditions. We present a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution. The refined initial starting condition allows the iterative algorithm to converge to a "better" local minimum. The procedure is applicable to a wide class of clustering algorithms for both discrete and continuous data. We demonstrate the application of this method to the popular K-Means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. Refinement run time is considerably lower than the time required to cluster the full database. The method is scalable and can be coupled with a scalable clustering algorithm to address the large-scale cl...
Cite
Text
Bradley and Fayyad. "Refining Initial Points for K-Means Clustering." International Conference on Machine Learning, 1998.Markdown
[Bradley and Fayyad. "Refining Initial Points for K-Means Clustering." International Conference on Machine Learning, 1998.](https://mlanthology.org/icml/1998/bradley1998icml-refining/)BibTeX
@inproceedings{bradley1998icml-refining,
title = {{Refining Initial Points for K-Means Clustering}},
author = {Bradley, Paul S. and Fayyad, Usama M.},
booktitle = {International Conference on Machine Learning},
year = {1998},
pages = {91-99},
url = {https://mlanthology.org/icml/1998/bradley1998icml-refining/}
}