Clustering Semi-Random Mixtures of Gaussians

Abstract

Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural robust model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd’s algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.

Cite

Text

Vijayaraghavan and Awasthi. "Clustering Semi-Random Mixtures of Gaussians." International Conference on Machine Learning, 2018.

Markdown

[Vijayaraghavan and Awasthi. "Clustering Semi-Random Mixtures of Gaussians." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/vijayaraghavan2018icml-clustering/)

BibTeX

@inproceedings{vijayaraghavan2018icml-clustering,
  title     = {{Clustering Semi-Random Mixtures of Gaussians}},
  author    = {Vijayaraghavan, Aravindan and Awasthi, Pranjal},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {5055-5064},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/vijayaraghavan2018icml-clustering/}
}