Global Data Analysis and the Fragmentation Problem in Decision Tree Induction
Abstract
We investigate an inherent limitation of top-down decision tree induction in which the continuous partitioning of the instance space progressively lessens the statistical support of every partial (i.e. disjunctive) hypothesis, known as the fragmentation problem . We show, both theoretically and empirically, how the fragmentation problem adversely affects predictive accuracy as variation ∇ (a measure of concept difficulty) increases. Applying feature-construction techniques at every tree node, which we implement on a decision tree inducer DALI , is proved to only partially solve the fragmentation problem. Our study illustrates how a more robust solution must also assess the value of each partial hypothesis by recurring to all available training data, an approach we name global data analysis , which decision tree induction alone is unable to accomplish. The value of global data analysis is evaluated by comparing modified versions of C4.5 rules with C4.5 trees and DALI , on both artificial and real-world domains. Empirical results suggest the importance of combining both feature construction and global data analysis to solve the fragmentation problem.
Cite
Text
Vilalta et al. "Global Data Analysis and the Fragmentation Problem in Decision Tree Induction." European Conference on Machine Learning, 1997. doi:10.1007/3-540-62858-4_95Markdown
[Vilalta et al. "Global Data Analysis and the Fragmentation Problem in Decision Tree Induction." European Conference on Machine Learning, 1997.](https://mlanthology.org/ecmlpkdd/1997/vilalta1997ecml-global/) doi:10.1007/3-540-62858-4_95BibTeX
@inproceedings{vilalta1997ecml-global,
title = {{Global Data Analysis and the Fragmentation Problem in Decision Tree Induction}},
author = {Vilalta, Ricardo and Blix, Gunnar and Rendell, Larry A.},
booktitle = {European Conference on Machine Learning},
year = {1997},
pages = {312-326},
doi = {10.1007/3-540-62858-4_95},
url = {https://mlanthology.org/ecmlpkdd/1997/vilalta1997ecml-global/}
}