Time and Space Efficient Spectral Clustering via Column Sampling

Abstract

Spectral clustering is an elegant and powerful approach for clustering. However, the underlying eigen-decomposition takes cubic time and quadratic space w.r.t. the data set size. These can be reduced by the Nystrm method which samples only a subset of columns from the matrix. However, the manipulation and storage of these sampled columns can still be expensive when the data set is large. In this paper, we propose a time- and space-efficient spectral clustering algorithm which can scale to very large data sets. A general procedure to orthogonalize the approximated eigenvectors is also proposed. Extensive spectral clustering experiments on a number of data sets, ranging in size from a few thousands to several millions, demonstrate the accuracy and scalability of the proposed approach. We further apply it to the task of image segmentation. For images with more than 10 millions pixels, this algorithm can obtain the eigenvectors in 1 minute on a single machine.

Cite

Text

Li et al. "Time and Space Efficient Spectral Clustering via Column Sampling." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2011. doi:10.1109/CVPR.2011.5995425

Markdown

[Li et al. "Time and Space Efficient Spectral Clustering via Column Sampling." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2011.](https://mlanthology.org/cvpr/2011/li2011cvpr-time/) doi:10.1109/CVPR.2011.5995425

BibTeX

@inproceedings{li2011cvpr-time,
  title     = {{Time and Space Efficient Spectral Clustering via Column Sampling}},
  author    = {Li, Mu and Lian, Xiao-Chen and Kwok, James T. and Lu, Bao-Liang},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2011},
  pages     = {2297-2304},
  doi       = {10.1109/CVPR.2011.5995425},
  url       = {https://mlanthology.org/cvpr/2011/li2011cvpr-time/}
}