Boosted Classification Trees and Class Probability/Quantile Estimation

Abstract

The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y=1|x]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y=1|x]. We first examine whether the latter problem, estimation of P[y=1|x], can be solved with LogitBoost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y=1|x] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification.

Cite

Text

Mease et al. "Boosted Classification Trees and Class Probability/Quantile Estimation." Journal of Machine Learning Research, 2007.

Markdown

[Mease et al. "Boosted Classification Trees and Class Probability/Quantile Estimation." Journal of Machine Learning Research, 2007.](https://mlanthology.org/jmlr/2007/mease2007jmlr-boosted/)

BibTeX

@article{mease2007jmlr-boosted,
  title     = {{Boosted Classification Trees and Class Probability/Quantile Estimation}},
  author    = {Mease, David and Wyner, Abraham J. and Buja, Andreas},
  journal   = {Journal of Machine Learning Research},
  year      = {2007},
  pages     = {409-439},
  volume    = {8},
  url       = {https://mlanthology.org/jmlr/2007/mease2007jmlr-boosted/}
}