Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction

Abstract

In data analysis, induction of decision trees serves two main goals: first, induced decision trees can be used for classification/prediction of new instances, and second, they represent an easy-to-interpret model of the problem domain that can be used for explanation. The accuracy of the induced classifier is usually estimated using N-fold cross validation, whereas for explanation purposes a decision tree induced from all the available data is used. Decision tree learning is relatively non-robust: a small change in the training set may significantly change the structure of the induced decision tree. This paper presents a decision tree construction method in which the domain model is constructed by consensus clustering of N decision trees induced in N-fold cross-validation. Experimental results show that consensus decision trees are simpler than C4.5 decision trees, indicating that they may be a more stable approximation of the intended domain model than decision tree, constructed from the entire set of training instances.

Cite

Text

Kavsek et al. "Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_22

Markdown

[Kavsek et al. "Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/kavsek2001ecml-consensus/) doi:10.1007/3-540-44795-4_22

BibTeX

@inproceedings{kavsek2001ecml-consensus,
  title     = {{Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction}},
  author    = {Kavsek, Branko and Lavrac, Nada and Ferligoj, Anuska},
  booktitle = {European Conference on Machine Learning},
  year      = {2001},
  pages     = {251-262},
  doi       = {10.1007/3-540-44795-4_22},
  url       = {https://mlanthology.org/ecmlpkdd/2001/kavsek2001ecml-consensus/}
}