Robust K-Means: A Theoretical Revisit

Abstract

Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on “well-structured” datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers. Finally, we show that, with slight modifications, the most general non-asymptotic results for consistency of quadratic k-means remain valid for this robust variant.

Cite

Text

Georgogiannis. "Robust K-Means: A Theoretical Revisit." Neural Information Processing Systems, 2016.

Markdown

[Georgogiannis. "Robust K-Means: A Theoretical Revisit." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/georgogiannis2016neurips-robust/)

BibTeX

@inproceedings{georgogiannis2016neurips-robust,
  title     = {{Robust K-Means: A Theoretical Revisit}},
  author    = {Georgogiannis, Alexandros},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {2891-2899},
  url       = {https://mlanthology.org/neurips/2016/georgogiannis2016neurips-robust/}
}