On the Number of Examples and Stages Needed for Learning Decision Trees

Abstract

Abstract We show that V C D I M ( r - D T n ) = ∑ i = 0 r ( n i ) , where r-DT n denotes the set of all boolean functions on n boolean variables defined by decision trees of rank at most r, and VCDIM(r-DT n ) denotes its Vapnik–Chervonenkis dimension. It follows that the number of examples needed for learning r-DT n can be determined modulo a small factor. We show furthermore that ω(log log log(|T|)) stages are necessary in the worstcase for learning arbitrary decision trees T, if the sample size of the learning algorithm is bounded by some function of the form 2polylog. This bound is tight because a variant of the algorithm of Ehrenfeucht and Haussler uses at most log log log(|T|) stages and has a running-time proportional to n log2(|T|) (and polynomial in the remaining parameters).

Cite

Text

Simon. "On the Number of Examples and Stages Needed for Learning Decision Trees." Annual Conference on Computational Learning Theory, 1990. doi:10.1016/B978-1-55860-146-8.50026-6

Markdown

[Simon. "On the Number of Examples and Stages Needed for Learning Decision Trees." Annual Conference on Computational Learning Theory, 1990.](https://mlanthology.org/colt/1990/simon1990colt-number/) doi:10.1016/B978-1-55860-146-8.50026-6

BibTeX

@inproceedings{simon1990colt-number,
  title     = {{On the Number of Examples and Stages Needed for Learning Decision Trees}},
  author    = {Simon, Hans Ulrich},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {1990},
  pages     = {303-313},
  doi       = {10.1016/B978-1-55860-146-8.50026-6},
  url       = {https://mlanthology.org/colt/1990/simon1990colt-number/}
}