Streaming K-Means Approximation

Abstract

We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means, in which the algorithm is allowed to output more than k centers (based on the recent k-means++"), and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method."

Cite

Text

Ailon et al. "Streaming K-Means Approximation." Neural Information Processing Systems, 2009.

Markdown

[Ailon et al. "Streaming K-Means Approximation." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/ailon2009neurips-streaming/)

BibTeX

@inproceedings{ailon2009neurips-streaming,
  title     = {{Streaming K-Means Approximation}},
  author    = {Ailon, Nir and Jaiswal, Ragesh and Monteleoni, Claire},
  booktitle = {Neural Information Processing Systems},
  year      = {2009},
  pages     = {10-18},
  url       = {https://mlanthology.org/neurips/2009/ailon2009neurips-streaming/}
}