Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation

Abstract

This work is concerned with the estimation of a classifier's accuracy. We first review some existing methods for error estimation, focusing on cross-validation and bootstrap, and motivate the use of kernel-based smoothing for small sample size. We use the term data cloning to refer to the process of (re)sampling the data via kernel-based smoothed bootstrap. A number of novel estimators based on cloning is presented. Finally, we extend our estimators to to allow cloning of complex real-life data sets, in which a data point may include continuous, bounded, integer and nominal attributes. This allows for better 1 classifier evaluation over heterogeneous real data repositories with limited amount of data, such as the UCI repository. We use the root mean squared error (RMSE) as a measure of estimators quality and support this choice with a probabilistic argument. Using this measure, we report on a set of 28 experiments in which the new cloning methods outperform cross-validation as well as the .632+ bootstrap, which, according to Efron and Tibshirani Efron and Tibshirani (1997), is the estimator of choice. Although the proposed estimators require more computational e#ort than the established ones, the increased time complexity is within a constant factor of that of the relevant traditional estimators. Based on the motivation and the empirical results, we suggest that the cloning-based .632+ estimator is superior to the other estimators, and note bootstrapped cross-validation as the second choice. Keywords: Classifier evaluation, empirical error estimation, data cloning, smoothed bootstrap 1.

Cite

Text

Shakhnarovich et al. "Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation." International Conference on Machine Learning, 2001.

Markdown

[Shakhnarovich et al. "Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation." International Conference on Machine Learning, 2001.](https://mlanthology.org/icml/2001/shakhnarovich2001icml-smoothed/)

BibTeX

@inproceedings{shakhnarovich2001icml-smoothed,
  title     = {{Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation}},
  author    = {Shakhnarovich, Gregory and El-Yaniv, Ran and Baram, Yoram},
  booktitle = {International Conference on Machine Learning},
  year      = {2001},
  pages     = {521-528},
  url       = {https://mlanthology.org/icml/2001/shakhnarovich2001icml-smoothed/}
}