Incremental Maximization of Non-Instance-Averaging Utility Functions with Applications to Knowledge Discovery Problems

Abstract

Data-independent sample bounds are known to grossly overestimate the amount of data needed for most individual problem instances. This has led to significant recent interest in sequential algorithms which also give precise guarantees about the quality of results, but determine the amount of data needed based on characteristics of the actual problem instance at hand, and thus need significantly fewer examples. In this paper, we present a practical sequential sampling algorithm which (a) is capable of quickly identifying the n best hypotheses and (b) works for all utility functions that can be estimated with bounded error. The algorithm thus can be used not only for predictive learning, but also for tasks such as association rule finding or subgroup discovery. Our experiments with two real-world domains show that our algorithm can be orders of magnitude faster than non-sequential sampling algorithms.

Cite

Text

Scheffer and Wrobel. "Incremental Maximization of Non-Instance-Averaging Utility Functions with Applications to Knowledge Discovery Problems." International Conference on Machine Learning, 2001.

Markdown

[Scheffer and Wrobel. "Incremental Maximization of Non-Instance-Averaging Utility Functions with Applications to Knowledge Discovery Problems." International Conference on Machine Learning, 2001.](https://mlanthology.org/icml/2001/scheffer2001icml-incremental/)

BibTeX

@inproceedings{scheffer2001icml-incremental,
  title     = {{Incremental Maximization of Non-Instance-Averaging Utility Functions with Applications to Knowledge Discovery Problems}},
  author    = {Scheffer, Tobias and Wrobel, Stefan},
  booktitle = {International Conference on Machine Learning},
  year      = {2001},
  pages     = {481-488},
  url       = {https://mlanthology.org/icml/2001/scheffer2001icml-incremental/}
}