Unsupervised Feature Selection for the $k$-Means Clustering Problem

Abstract

We present a novel feature selection algorithm for the $k$-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter $\epsilon \in (0,1)$, selects and appropriately rescales in an unsupervised manner $\Theta(k \log(k / \epsilon) / \epsilon^2)$ features from a dataset of arbitrary dimensions. We prove that, if we run any $\gamma$-approximate $k$-means algorithm ($\gamma \geq 1$) on the features selected using our method, we can find a $(1+(1+\epsilon)\gamma)$-approximate partition with high probability.

Cite

Text

Boutsidis et al. "Unsupervised Feature Selection for the $k$-Means Clustering Problem." Neural Information Processing Systems, 2009.

Markdown

[Boutsidis et al. "Unsupervised Feature Selection for the $k$-Means Clustering Problem." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/boutsidis2009neurips-unsupervised/)

BibTeX

@inproceedings{boutsidis2009neurips-unsupervised,
  title     = {{Unsupervised Feature Selection for the $k$-Means Clustering Problem}},
  author    = {Boutsidis, Christos and Drineas, Petros and Mahoney, Michael W.},
  booktitle = {Neural Information Processing Systems},
  year      = {2009},
  pages     = {153-161},
  url       = {https://mlanthology.org/neurips/2009/boutsidis2009neurips-unsupervised/}
}