Simple and Scalable Sparse K-Means Clustering via Feature Ranking

Abstract

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.

Cite

Text

Zhang et al. "Simple and Scalable Sparse K-Means Clustering via Feature Ranking." Neural Information Processing Systems, 2020.

Markdown

[Zhang et al. "Simple and Scalable Sparse K-Means Clustering via Feature Ranking." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/zhang2020neurips-simple/)

BibTeX

@inproceedings{zhang2020neurips-simple,
  title     = {{Simple and Scalable Sparse K-Means Clustering via Feature Ranking}},
  author    = {Zhang, Zhiyue and Lange, Kenneth and Xu, Jason},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/zhang2020neurips-simple/}
}