Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means

Abstract

Kernel $k$-means is arguably one of the most common approaches to clustering. In this paper, we investigate the efficiency of kernel $k$-means combined with randomized sketches in terms of both statistical analysis and computational requirements. More precisely, we propose a unified randomized sketches framework to kernel $k$-means and investigate its excess risk bounds, obtaining the state-of-the-art risk bound with only a fraction of computations. Indeed, we prove that it suffices to choose the sketch dimension $\Omega(\sqrt{n})$ to obtain the same accuracy of exact kernel $k$-means with greatly reducing the computational costs, for sub-Gaussian sketches, the randomized orthogonal system (ROS) sketches, and Nystr\"om kernel $k$-means, where $n$ is the number of samples. To the best of our knowledge, this is the first result of this kind for unsupervised learning. Finally, the numerical experiments on simulated data and real-world datasets validate our theoretical analysis.

Cite

Text

Yin et al. "Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means." Neural Information Processing Systems, 2022.

Markdown

[Yin et al. "Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/yin2022neurips-randomized/)

BibTeX

@inproceedings{yin2022neurips-randomized,
  title     = {{Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means}},
  author    = {Yin, Rong and Liu, Yong and Wang, Weiping and Meng, Dan},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/yin2022neurips-randomized/}
}