Adapting K-Means Algorithms for Outliers

Abstract

This paper shows how to adapt several simple and classical sampling-based algorithms for the k-means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical k-means++ algorithm to the setting with outliers. However, their algorithm needs to output O(log(k)$\cdot$z) outliers, where z is the number of true outliers, to match the O(log k)-approximation guarantee of k-means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed k-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output (1 + $\epsilon$)z outliers while achieving an O(1/$\epsilon$)-approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular k-means\|of Bahmani et al. (PVLDB2012). A theoretical application of our techniques is an algorithm with running time O(nk^2/z) that achieves an O(1)-approximation to the objective function while outputting O(z) outliers, assuming k << z << n. This is complemented with a matching lower bound of $\Omega$(nk^2/z) for this problem in the oracle model.

Cite

Text

Grunau and Rozhoň. "Adapting K-Means Algorithms for Outliers." International Conference on Machine Learning, 2022.

Markdown

[Grunau and Rozhoň. "Adapting K-Means Algorithms for Outliers." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/grunau2022icml-adapting/)

BibTeX

@inproceedings{grunau2022icml-adapting,
  title     = {{Adapting K-Means Algorithms for Outliers}},
  author    = {Grunau, Christoph and Rozhoň, Václav},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {7845-7886},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/grunau2022icml-adapting/}
}