Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation
Abstract
This work is concerned with the estimation of a classifier's accuracy. We first review some existing methods for error estimation, focusing on cross-validation and bootstrap, and motivate the use of kernel-based smoothing for small sample size. We use the term data cloning to refer to the process of (re)sampling the data via kernel-based smoothed bootstrap. A number of novel estimators based on cloning is presented. Finally, we extend our estimators to to allow cloning of complex real-life data sets, in which a data point may include continuous, bounded, integer and nominal attributes. This allows for better 1 classifier evaluation over heterogeneous real data repositories with limited amount of data, such as the UCI repository. We use the root mean squared error (RMSE) as a measure of estimators quality and support this choice with a probabilistic argument. Using this measure, we report on a set of 28 experiments in which the new cloning methods outperform cross-validation as well as the .632+ bootstrap, which, according to Efron and Tibshirani Efron and Tibshirani (1997), is the estimator of choice. Although the proposed estimators require more computational e#ort than the established ones, the increased time complexity is within a constant factor of that of the relevant traditional estimators. Based on the motivation and the empirical results, we suggest that the cloning-based .632+ estimator is superior to the other estimators, and note bootstrapped cross-validation as the second choice. Keywords: Classifier evaluation, empirical error estimation, data cloning, smoothed bootstrap 1.
Cite
Text
Shakhnarovich et al. "Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation." International Conference on Machine Learning, 2001.Markdown
[Shakhnarovich et al. "Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation." International Conference on Machine Learning, 2001.](https://mlanthology.org/icml/2001/shakhnarovich2001icml-smoothed/)BibTeX
@inproceedings{shakhnarovich2001icml-smoothed,
title = {{Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation}},
author = {Shakhnarovich, Gregory and El-Yaniv, Ran and Baram, Yoram},
booktitle = {International Conference on Machine Learning},
year = {2001},
pages = {521-528},
url = {https://mlanthology.org/icml/2001/shakhnarovich2001icml-smoothed/}
}