Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction
Abstract
In data analysis, induction of decision trees serves two main goals: first, induced decision trees can be used for classification/prediction of new instances, and second, they represent an easy-to-interpret model of the problem domain that can be used for explanation. The accuracy of the induced classifier is usually estimated using N-fold cross validation, whereas for explanation purposes a decision tree induced from all the available data is used. Decision tree learning is relatively non-robust: a small change in the training set may significantly change the structure of the induced decision tree. This paper presents a decision tree construction method in which the domain model is constructed by consensus clustering of N decision trees induced in N-fold cross-validation. Experimental results show that consensus decision trees are simpler than C4.5 decision trees, indicating that they may be a more stable approximation of the intended domain model than decision tree, constructed from the entire set of training instances.
Cite
Text
Kavsek et al. "Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_22Markdown
[Kavsek et al. "Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/kavsek2001ecml-consensus/) doi:10.1007/3-540-44795-4_22BibTeX
@inproceedings{kavsek2001ecml-consensus,
title = {{Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction}},
author = {Kavsek, Branko and Lavrac, Nada and Ferligoj, Anuska},
booktitle = {European Conference on Machine Learning},
year = {2001},
pages = {251-262},
doi = {10.1007/3-540-44795-4_22},
url = {https://mlanthology.org/ecmlpkdd/2001/kavsek2001ecml-consensus/}
}