Multi-Swap K-Means++

Abstract

The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local-search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local-search algorithm by considering larger and more sophisticated local-search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our algorithm is practical, namely easy to implement and fast enough to run on a variety of classic datasets, and outputs solutions of better cost.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Beretta et al. "Multi-Swap K-Means++." Neural Information Processing Systems, 2023.

Markdown

[Beretta et al. "Multi-Swap K-Means++." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/beretta2023neurips-multiswap/)

BibTeX

@inproceedings{beretta2023neurips-multiswap,
  title     = {{Multi-Swap K-Means++}},
  author    = {Beretta, Lorenzo and Cohen-Addad, Vincent and Lattanzi, Silvio and Parotsidis, Nikos},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/beretta2023neurips-multiswap/}
}