Sampling-Based Data Mining Algorithms: Modern Techniques and Case Studies

Abstract

Sampling a dataset for faster analysis and looking at it as a sample from an unknown distribution are two faces of the same coin. We discuss the use of modern techniques involving the Vapnik-Chervonenkis (VC) dimension to study the trade-off between sample size and accuracy of data mining results that can be obtained from a sample. We report two case studies where we and collaborators employed these techniques to develop efficient sampling-based algorithms for the problems of betweenness centrality computation in large graphs and extracting statistically significant Frequent Itemsets from transactional datasets.

Cite

Text

Riondato. "Sampling-Based Data Mining Algorithms: Modern Techniques and Case Studies." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014. doi:10.1007/978-3-662-44845-8_48

Markdown

[Riondato. "Sampling-Based Data Mining Algorithms: Modern Techniques and Case Studies." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014.](https://mlanthology.org/ecmlpkdd/2014/riondato2014ecmlpkdd-samplingbased/) doi:10.1007/978-3-662-44845-8_48

BibTeX

@inproceedings{riondato2014ecmlpkdd-samplingbased,
  title     = {{Sampling-Based Data Mining Algorithms: Modern Techniques and Case Studies}},
  author    = {Riondato, Matteo},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2014},
  pages     = {516-519},
  doi       = {10.1007/978-3-662-44845-8_48},
  url       = {https://mlanthology.org/ecmlpkdd/2014/riondato2014ecmlpkdd-samplingbased/}
}