CC-MR - Finding Connected Components in Huge Graphs with MapReduce
Abstract
The detection of connected components in graphs is a well-known problem arising in a large number of applications including data mining, analysis of social networks, image analysis and a lot of other related problems. In spite of the existing very efficient serial algorithms, this problem remains a subject of research due to increasing data amounts produced by modern information systems which cannot be handled by single workstations. Only highly parallelized approaches on multi-core-servers or computer clusters are able to deal with these large-scale data sets. In this work we present a solution for this problem for distributed memory architectures, and provide an implementation for the well-known MapReduce framework developed by Google. Our algorithm CC-MR significantly outperforms the existing approaches for the MapReduce framework in terms of the number of necessary iterations, communication costs and execution runtime, as we show in our experimental evaluation on synthetic and real-world data. Furthermore, we present a technique for accelerating our implementation for datasets with very heterogeneous component sizes as they often appear in real data sets.
Cite
Text
Seidl et al. "CC-MR - Finding Connected Components in Huge Graphs with MapReduce." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33460-3_35Markdown
[Seidl et al. "CC-MR - Finding Connected Components in Huge Graphs with MapReduce." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/seidl2012ecmlpkdd-ccmr/) doi:10.1007/978-3-642-33460-3_35BibTeX
@inproceedings{seidl2012ecmlpkdd-ccmr,
title = {{CC-MR - Finding Connected Components in Huge Graphs with MapReduce}},
author = {Seidl, Thomas and Boden, Brigitte and Fries, Sergej},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2012},
pages = {458-473},
doi = {10.1007/978-3-642-33460-3_35},
url = {https://mlanthology.org/ecmlpkdd/2012/seidl2012ecmlpkdd-ccmr/}
}