Peepholing: Choosing Attributes Efficiently for Megainduction

Abstract

This paper presents a new method of speeding up the induction of decision trees from large noisy domains with continuous attributes. Empirical evaluations suggest it is several times faster than the basic ID3 algorithm on training sets of tens of thousands of examples, and for very large sets reduces learning time from a superlinear to an approximately linear function of the number of examples. This renders induction manageable on tasks that were previously too large to be considered feasible. The method works by examining random subsets (sometimes called subsamples) of the larger sets of examples to be assessed. From these it is typically able to eliminate from consideration many of the attributes and a large fraction of their range of values. It does not cause a significant change in accuracy, and is not greatly affected by noise.

Cite

Text

Catlett. "Peepholing: Choosing Attributes Efficiently for Megainduction." International Conference on Machine Learning, 1992. doi:10.1016/B978-1-55860-247-2.50012-7

Markdown

[Catlett. "Peepholing: Choosing Attributes Efficiently for Megainduction." International Conference on Machine Learning, 1992.](https://mlanthology.org/icml/1992/catlett1992icml-peepholing/) doi:10.1016/B978-1-55860-247-2.50012-7

BibTeX

@inproceedings{catlett1992icml-peepholing,
  title     = {{Peepholing: Choosing Attributes Efficiently for Megainduction}},
  author    = {Catlett, Jason},
  booktitle = {International Conference on Machine Learning},
  year      = {1992},
  pages     = {49-54},
  doi       = {10.1016/B978-1-55860-247-2.50012-7},
  url       = {https://mlanthology.org/icml/1992/catlett1992icml-peepholing/}
}