Scalability, Search, and Sampling: From Smart Algorithms to Active Discovery

Abstract

The focus on scalability to very large datasets has been a distinguishing feature of the KDD endeavour right from the start of the area. In the present stage of its development, the field has begun to seriously approach the issue, and a number of different techniques for scaling up KDD algorithms have emerged. Traditionally, such techniques are concentrating on the search aspects of the problem, employing algorithmic techniques to avoid searching parts of the space or to speed up processing by exploiting properties of the underlying host systems. Such techniques guarantee perfect correctness of solutions, but can never reach sublinear complexity. In contrast, researchers have recently begun to take a fresh and principled look at stochastic sampling techniques which give only an approximate quality guarantee, but can make runtimes almost independent of the size of the database at hand. In the talk, we give an overview of both of these classes of approaches, focusing on individual examples from our own work for more detailed illustrations of how such techniques work. We briefly outline how active learning elements may enhance KDD approaches in the future.

Cite

Text

Wrobel. "Scalability, Search, and Sampling: From Smart Algorithms to Active Discovery." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_55

Markdown

[Wrobel. "Scalability, Search, and Sampling: From Smart Algorithms to Active Discovery." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/wrobel2001ecml-scalability/) doi:10.1007/3-540-44795-4_55

BibTeX

@inproceedings{wrobel2001ecml-scalability,
  title     = {{Scalability, Search, and Sampling: From Smart Algorithms to Active Discovery}},
  author    = {Wrobel, Stefan},
  booktitle = {European Conference on Machine Learning},
  year      = {2001},
  pages     = {615},
  doi       = {10.1007/3-540-44795-4_55},
  url       = {https://mlanthology.org/ecmlpkdd/2001/wrobel2001ecml-scalability/}
}