Compressed K-Means for Large-Scale Clustering

Abstract

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Cite

Text

Shen et al. "Compressed K-Means for Large-Scale Clustering." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10852

Markdown

[Shen et al. "Compressed K-Means for Large-Scale Clustering." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/shen2017aaai-compressed/) doi:10.1609/AAAI.V31I1.10852

BibTeX

@inproceedings{shen2017aaai-compressed,
  title     = {{Compressed K-Means for Large-Scale Clustering}},
  author    = {Shen, Xiao-Bo and Liu, Weiwei and Tsang, Ivor W. and Shen, Fumin and Sun, Quan-Sen},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {2527-2533},
  doi       = {10.1609/AAAI.V31I1.10852},
  url       = {https://mlanthology.org/aaai/2017/shen2017aaai-compressed/}
}