PG-Means: Learning the Number of Clusters in Data

Abstract

We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods.

Cite

Text

Feng and Hamerly. "PG-Means: Learning the Number of Clusters in Data." Neural Information Processing Systems, 2006.

Markdown

[Feng and Hamerly. "PG-Means: Learning the Number of Clusters in Data." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/feng2006neurips-pgmeans/)

BibTeX

@inproceedings{feng2006neurips-pgmeans,
  title     = {{PG-Means: Learning the Number of Clusters in Data}},
  author    = {Feng, Yu and Hamerly, Greg},
  booktitle = {Neural Information Processing Systems},
  year      = {2006},
  pages     = {393-400},
  url       = {https://mlanthology.org/neurips/2006/feng2006neurips-pgmeans/}
}