Increasing the Performance and Consistency of Classification Trees by Using the Accuracy Criterion at the Leaves
Abstract
The traditional split criteria in tree induction (Gini, Entropy and others) do not minimize the number of misclassifications at each node, and hence cannot correctly estimate the parameters of a tree, even if the underlying model can be correctly modeled by a tree procedure. We examine this effect and show that the difference in accuracy can be as much as 15% in the worst case. We prove that using the Gini criterion, trees unbounded in size may be grown in order to correctly estimate a model. We then give a procedure that is guaranteed to give finite trees and define a modification to the standard tree growing methodology that results in improvements in predictive accuracy from 1% to 5% on datasets from the UCI repository.
Cite
Text
Lubinsky. "Increasing the Performance and Consistency of Classification Trees by Using the Accuracy Criterion at the Leaves." International Conference on Machine Learning, 1995. doi:10.1016/B978-1-55860-377-6.50053-0Markdown
[Lubinsky. "Increasing the Performance and Consistency of Classification Trees by Using the Accuracy Criterion at the Leaves." International Conference on Machine Learning, 1995.](https://mlanthology.org/icml/1995/lubinsky1995icml-increasing/) doi:10.1016/B978-1-55860-377-6.50053-0BibTeX
@inproceedings{lubinsky1995icml-increasing,
title = {{Increasing the Performance and Consistency of Classification Trees by Using the Accuracy Criterion at the Leaves}},
author = {Lubinsky, David J.},
booktitle = {International Conference on Machine Learning},
year = {1995},
pages = {371-377},
doi = {10.1016/B978-1-55860-377-6.50053-0},
url = {https://mlanthology.org/icml/1995/lubinsky1995icml-increasing/}
}