A Faster Sampling Algorithm for Spherical $k$-Means

Rameshwar Pratap, Anup Deshmukh, Pratheeksha Nair, Tarun Dutt

ACML 2018 pp. 343-358

/acml/2018/pratap2018acml-faster/

Abstract

The Spherical $k$-means algorithm proposed by (Dhillon and Modha, 2001) is a popular algorithm for clustering high dimensional datasets. Although their algorithm is simple and easy to implement, a drawback of the same is that it doesn’t provide any provable guarantee on the clustering result. (Endo and Miyamoto, 2015) suggest an adaptive sampling based algorithm (Spherical $k$-means$++$) which gives near optimal results, with high probability. However, their algorithm requires $k$ sequential passes over the entire dataset, which may not be feasible when the dataset and/or the values of $k$ are large. In this work, we propose a Markov chain based sampling algorithm that takes only one pass over the data, and gives close to optimal clustering similar to Spherical $k$-means$++$, i.e., a faster algorithm while maintaining almost the same approximation. We present a theoretical analysis of the algorithm, and complement it with rigorous experiments on real-world datasets. Our proposed algorithm is simple and easy to implement, and can be easily adopted in practice.

PDF ACML Semantic Scholar

Cite

Text

Pratap et al. "A Faster Sampling Algorithm for Spherical $k$-Means." Proceedings of The 10th Asian Conference on Machine Learning, 2018.

Markdown

[Pratap et al. "A Faster Sampling Algorithm for Spherical $k$-Means." Proceedings of The 10th Asian Conference on Machine Learning, 2018.](https://mlanthology.org/acml/2018/pratap2018acml-faster/)

BibTeX

@inproceedings{pratap2018acml-faster,
  title     = {{A Faster Sampling Algorithm for Spherical $k$-Means}},
  author    = {Pratap, Rameshwar and Deshmukh, Anup and Nair, Pratheeksha and Dutt, Tarun},
  booktitle = {Proceedings of The 10th Asian Conference on Machine Learning},
  year      = {2018},
  pages     = {343-358},
  volume    = {95},
  url       = {https://mlanthology.org/acml/2018/pratap2018acml-faster/}
}