Small Sample Decision Tree Pruning
Abstract
We evaluate the performance of weakest-link decision tree pruning using cross-validation. This technique maps tree pruning into a problem of tree selection: Find the best tree, i.e the right-sized tree, from a set of trees ranging in size from the unpruned tree to a null tree. For small samples (no more than 200 cases), extensive empirical evidence supports the following conclusions relative to tree selection: (a) 10-fold cross-validation is nearly unbiased; (b) not pruning a covering tree is highly biased; (c) with at least 100 samples, 10-fold cross-validation usually outperforms not pruning; (d) with at least 50 samples, a strategy based on 2-fold cross-validation is generally more effective than both 10-fold cross-validation and not pruning; (e) with fewer than 50 samples, estimator accuracy is highly variable, with a substantial risk of overoptimism for not pruning on noisy data.
Cite
Text
Weiss and Indurkhya. "Small Sample Decision Tree Pruning." International Conference on Machine Learning, 1994. doi:10.1016/B978-1-55860-335-6.50048-9Markdown
[Weiss and Indurkhya. "Small Sample Decision Tree Pruning." International Conference on Machine Learning, 1994.](https://mlanthology.org/icml/1994/weiss1994icml-small/) doi:10.1016/B978-1-55860-335-6.50048-9BibTeX
@inproceedings{weiss1994icml-small,
title = {{Small Sample Decision Tree Pruning}},
author = {Weiss, Sholom M. and Indurkhya, Nitin},
booktitle = {International Conference on Machine Learning},
year = {1994},
pages = {335-342},
doi = {10.1016/B978-1-55860-335-6.50048-9},
url = {https://mlanthology.org/icml/1994/weiss1994icml-small/}
}