PG-Means: Learning the Number of Clusters in Data
Abstract
We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods.
Cite
Text
Feng and Hamerly. "PG-Means: Learning the Number of Clusters in Data." Neural Information Processing Systems, 2006.Markdown
[Feng and Hamerly. "PG-Means: Learning the Number of Clusters in Data." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/feng2006neurips-pgmeans/)BibTeX
@inproceedings{feng2006neurips-pgmeans,
title = {{PG-Means: Learning the Number of Clusters in Data}},
author = {Feng, Yu and Hamerly, Greg},
booktitle = {Neural Information Processing Systems},
year = {2006},
pages = {393-400},
url = {https://mlanthology.org/neurips/2006/feng2006neurips-pgmeans/}
}