Statistical Preprocessing for Decision Tree Induction
Abstract
Some apparently simple numeric data sets cause significant problems for existing decision tree induction algorithms, in that no method is able to find a small, accurate tree, even though one exists. One source of this difficulty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are not equipped to take into account some patterns in numeric attribute spaces, and presents a framework for capturing some such patterns into decision tree induction. As a case study, it is demonstrated empirically that supervised clustering, when used as a preprocessing step, can improve the quality of both univariate and multivariate decision trees.
Cite
Text
Murthy. "Statistical Preprocessing for Decision Tree Induction." Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, 1995.Markdown
[Murthy. "Statistical Preprocessing for Decision Tree Induction." Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, 1995.](https://mlanthology.org/aistats/1995/murthy1995aistats-statistical/)BibTeX
@inproceedings{murthy1995aistats-statistical,
title = {{Statistical Preprocessing for Decision Tree Induction}},
author = {Murthy, Sreerama K.},
booktitle = {Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics},
year = {1995},
pages = {403-409},
volume = {R0},
url = {https://mlanthology.org/aistats/1995/murthy1995aistats-statistical/}
}