Fast and Accurate $k$-Means++ via Rejection Sampling

Abstract

$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present such a near linear time algorithm for $k$-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as $k$-means++ and significantly improves earlier results on fast $k$-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than $k$-means++ and obtains solutions of equivalent quality.

Cite

Text

Cohen-Addad et al. "Fast and Accurate $k$-Means++ via Rejection Sampling." Neural Information Processing Systems, 2020.

Markdown

[Cohen-Addad et al. "Fast and Accurate $k$-Means++ via Rejection Sampling." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/cohenaddad2020neurips-fast/)

BibTeX

@inproceedings{cohenaddad2020neurips-fast,
  title     = {{Fast and Accurate $k$-Means++ via Rejection Sampling}},
  author    = {Cohen-Addad, Vincent and Lattanzi, Silvio and Norouzi-Fard, Ashkan and Sohler, Christian and Svensson, Ola},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/cohenaddad2020neurips-fast/}
}