Parallel Spectral Clustering

Abstract

Spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large datasets, we propose to parallelize both memory use and computation on distributed computers. Through an empirical study on a large document dataset of 193,844 data instances and a large photo dataset of 637,137, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem.

Cite

Text

Song et al. "Parallel Spectral Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008. doi:10.1007/978-3-540-87481-2_25

Markdown

[Song et al. "Parallel Spectral Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008.](https://mlanthology.org/ecmlpkdd/2008/song2008ecmlpkdd-parallel/) doi:10.1007/978-3-540-87481-2_25

BibTeX

@inproceedings{song2008ecmlpkdd-parallel,
  title     = {{Parallel Spectral Clustering}},
  author    = {Song, Yangqiu and Chen, WenYen and Bai, Hongjie and Lin, Chih-Jen and Chang, Edward Y.},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2008},
  pages     = {374-389},
  doi       = {10.1007/978-3-540-87481-2_25},
  url       = {https://mlanthology.org/ecmlpkdd/2008/song2008ecmlpkdd-parallel/}
}