Learning Decision Trees Using the Area Under the ROC Curve
Abstract
ROC analysis is increasingly being recognised as an important tool for evaluation and comparison of classifiers when the operating characteristics (i.e. class distribution and cost parameters) are not known at training time. Usually, each classifier is characterised by its estimated true and false positive rates and is represented by a single point in the ROC diagram. In this paper, we show how a single decision tree can represent a set of classifiers by choosing different labellings of its leaves, or equivalently, an ordering on the leaves. In this setting, rather than estimating the accuracy of a single tree, it makes more sense to use the area under the ROC curve (AUC) as a quality metric. We also propose a novel splitting criterion which chooses the split with the highest local AUC. To the best of our knowledge, this is the first probabilistic splitting criterion that is not based on weighted average impurity. We present experiments suggesting that the AUC splitting criterion leads to trees with equal or better AUC value, without sacrificing accuracy if a single labelling is chosen.
Cite
Text
Ferri et al. "Learning Decision Trees Using the Area Under the ROC Curve." International Conference on Machine Learning, 2002.Markdown
[Ferri et al. "Learning Decision Trees Using the Area Under the ROC Curve." International Conference on Machine Learning, 2002.](https://mlanthology.org/icml/2002/ferri2002icml-learning/)BibTeX
@inproceedings{ferri2002icml-learning,
title = {{Learning Decision Trees Using the Area Under the ROC Curve}},
author = {Ferri, César and Flach, Peter A. and Hernández-Orallo, José},
booktitle = {International Conference on Machine Learning},
year = {2002},
pages = {139-146},
url = {https://mlanthology.org/icml/2002/ferri2002icml-learning/}
}