Random Projections for $k$-Means Clustering

Abstract

This paper discusses the topic of dimensionality reduction for $k$-means clustering. We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR^{n \times d}$) can be projected into $t = \Omega(k / \eps^2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps^{-2} k/ \log(d) \rceil )$ time, such that with constant probability the optimal $k$-partition of the point set is preserved within a factor of $2+\eps$. The projection is done by post-multiplying $A$ with a $d \times t$ random matrix $R$ having entries $+1/\sqrt{t}$ or $-1/\sqrt{t}$ with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

Cite

Text

Boutsidis et al. "Random Projections for $k$-Means Clustering." Neural Information Processing Systems, 2010.

Markdown

[Boutsidis et al. "Random Projections for $k$-Means Clustering." Neural Information Processing Systems, 2010.](https://mlanthology.org/neurips/2010/boutsidis2010neurips-random/)

BibTeX

@inproceedings{boutsidis2010neurips-random,
  title     = {{Random Projections for $k$-Means Clustering}},
  author    = {Boutsidis, Christos and Zouzias, Anastasios and Drineas, Petros},
  booktitle = {Neural Information Processing Systems},
  year      = {2010},
  pages     = {298-306},
  url       = {https://mlanthology.org/neurips/2010/boutsidis2010neurips-random/}
}