Complete Cross-Validation for Nearest Neighbor Classifiers
Abstract
Cross-validation is an established technique for estimating the accuracy of a classifier and is normally performed either using a number of random test/train partitions of the data, or using kfold cross-validation. We present a technique for calculating the complete cross-validation for nearest-neighbor classifiers: i.e., averaging over all desired test/train partitions of data. This technique is applied to several common classifier variants such as K-nearest-neighbor, stratified data partitioning and arbitrary loss functions. We demonstrate, with complexity analysis and experimental timing results, that the technique can be performed in time comparable to k-fold cross-validation, though in effect it averages an exponential number of trials. We show that the results of complete cross-validation are biased equally compared to subsampling and kfold cross-validation, and there is some reduction in variance. This algorithm offers significant benefits both in terms of time and accuracy. 1.
Cite
Text
Mullin and Sukthankar. "Complete Cross-Validation for Nearest Neighbor Classifiers." International Conference on Machine Learning, 2000.Markdown
[Mullin and Sukthankar. "Complete Cross-Validation for Nearest Neighbor Classifiers." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/mullin2000icml-complete/)BibTeX
@inproceedings{mullin2000icml-complete,
title = {{Complete Cross-Validation for Nearest Neighbor Classifiers}},
author = {Mullin, Matthew D. and Sukthankar, Rahul},
booktitle = {International Conference on Machine Learning},
year = {2000},
pages = {639-646},
url = {https://mlanthology.org/icml/2000/mullin2000icml-complete/}
}