A Faster Sampling Algorithm for Spherical $k$-Means

Abstract

The Spherical $k$-means algorithm proposed by (Dhillon and Modha, 2001) is a popular algorithm for clustering high dimensional datasets. Although their algorithm is simple and easy to implement, a drawback of the same is that it doesn’t provide any provable guarantee on the clustering result. (Endo and Miyamoto, 2015) suggest an adaptive sampling based algorithm (Spherical $k$-means$++$) which gives near optimal results, with high probability. However, their algorithm requires $k$ sequential passes over the entire dataset, which may not be feasible when the dataset and/or the values of $k$ are large. In this work, we propose a Markov chain based sampling algorithm that takes only one pass over the data, and gives close to optimal clustering similar to Spherical $k$-means$++$, i.e., a faster algorithm while maintaining almost the same approximation. We present a theoretical analysis of the algorithm, and complement it with rigorous experiments on real-world datasets. Our proposed algorithm is simple and easy to implement, and can be easily adopted in practice.

Cite

Text

Pratap et al. "A Faster Sampling Algorithm for Spherical $k$-Means." Proceedings of The 10th Asian Conference on Machine Learning, 2018.

Markdown

[Pratap et al. "A Faster Sampling Algorithm for Spherical $k$-Means." Proceedings of The 10th Asian Conference on Machine Learning, 2018.](https://mlanthology.org/acml/2018/pratap2018acml-faster/)

BibTeX

@inproceedings{pratap2018acml-faster,
  title     = {{A Faster Sampling Algorithm for Spherical $k$-Means}},
  author    = {Pratap, Rameshwar and Deshmukh, Anup and Nair, Pratheeksha and Dutt, Tarun},
  booktitle = {Proceedings of The 10th Asian Conference on Machine Learning},
  year      = {2018},
  pages     = {343-358},
  volume    = {95},
  url       = {https://mlanthology.org/acml/2018/pratap2018acml-faster/}
}