A Faster Sampling Algorithm for Spherical $k$-Means
Abstract
The Spherical $k$-means algorithm proposed by (Dhillon and Modha, 2001) is a popular algorithm for clustering high dimensional datasets. Although their algorithm is simple and easy to implement, a drawback of the same is that it doesn’t provide any provable guarantee on the clustering result. (Endo and Miyamoto, 2015) suggest an adaptive sampling based algorithm (Spherical $k$-means$++$) which gives near optimal results, with high probability. However, their algorithm requires $k$ sequential passes over the entire dataset, which may not be feasible when the dataset and/or the values of $k$ are large. In this work, we propose a Markov chain based sampling algorithm that takes only one pass over the data, and gives close to optimal clustering similar to Spherical $k$-means$++$, i.e., a faster algorithm while maintaining almost the same approximation. We present a theoretical analysis of the algorithm, and complement it with rigorous experiments on real-world datasets. Our proposed algorithm is simple and easy to implement, and can be easily adopted in practice.
Cite
Text
Pratap et al. "A Faster Sampling Algorithm for Spherical $k$-Means." Proceedings of The 10th Asian Conference on Machine Learning, 2018.Markdown
[Pratap et al. "A Faster Sampling Algorithm for Spherical $k$-Means." Proceedings of The 10th Asian Conference on Machine Learning, 2018.](https://mlanthology.org/acml/2018/pratap2018acml-faster/)BibTeX
@inproceedings{pratap2018acml-faster,
title = {{A Faster Sampling Algorithm for Spherical $k$-Means}},
author = {Pratap, Rameshwar and Deshmukh, Anup and Nair, Pratheeksha and Dutt, Tarun},
booktitle = {Proceedings of The 10th Asian Conference on Machine Learning},
year = {2018},
pages = {343-358},
volume = {95},
url = {https://mlanthology.org/acml/2018/pratap2018acml-faster/}
}