Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail
Abstract
Recent empirical studies revealed two surprising pathologies of several common decision tree pruning algorithms. First, tree size is often a linear function of training set size, even when additional tree structure yields no increase in accuracy. Second, building trees with data in which the class label and the attributes are independent often results in large trees. In both cases, the pruning algorithms fail to control tree growth as one would expect them to. We explore this behavior theoretically by constructing a statistical model of reduced error pruning. The model explains why and when the pathologies occur, and makes predictions about how to lessen their effects. The predictions are operationalized in a variant of reduced error pruning that is shown to control tree growth far better than the original algorithm. Introduction Despite more than three decades of intense research on decision trees, arguably the most commonly used learning mechanism in implemented AI systems, existin...
Cite
Text
Oates and Jensen. "Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail." AAAI Conference on Artificial Intelligence, 1999.Markdown
[Oates and Jensen. "Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail." AAAI Conference on Artificial Intelligence, 1999.](https://mlanthology.org/aaai/1999/oates1999aaai-theoretical/)BibTeX
@inproceedings{oates1999aaai-theoretical,
title = {{Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail}},
author = {Oates, Tim and Jensen, David D.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1999},
pages = {372-378},
url = {https://mlanthology.org/aaai/1999/oates1999aaai-theoretical/}
}