Small Sample Decision Tree Pruning

Abstract

We evaluate the performance of weakest-link decision tree pruning using cross-validation. This technique maps tree pruning into a problem of tree selection: Find the best tree, i.e the right-sized tree, from a set of trees ranging in size from the unpruned tree to a null tree. For small samples (no more than 200 cases), extensive empirical evidence supports the following conclusions relative to tree selection: (a) 10-fold cross-validation is nearly unbiased; (b) not pruning a covering tree is highly biased; (c) with at least 100 samples, 10-fold cross-validation usually outperforms not pruning; (d) with at least 50 samples, a strategy based on 2-fold cross-validation is generally more effective than both 10-fold cross-validation and not pruning; (e) with fewer than 50 samples, estimator accuracy is highly variable, with a substantial risk of overoptimism for not pruning on noisy data.

Cite

Text

Weiss and Indurkhya. "Small Sample Decision Tree Pruning." International Conference on Machine Learning, 1994. doi:10.1016/B978-1-55860-335-6.50048-9

Markdown

[Weiss and Indurkhya. "Small Sample Decision Tree Pruning." International Conference on Machine Learning, 1994.](https://mlanthology.org/icml/1994/weiss1994icml-small/) doi:10.1016/B978-1-55860-335-6.50048-9

BibTeX

@inproceedings{weiss1994icml-small,
  title     = {{Small Sample Decision Tree Pruning}},
  author    = {Weiss, Sholom M. and Indurkhya, Nitin},
  booktitle = {International Conference on Machine Learning},
  year      = {1994},
  pages     = {335-342},
  doi       = {10.1016/B978-1-55860-335-6.50048-9},
  url       = {https://mlanthology.org/icml/1994/weiss1994icml-small/}
}