Complete Cross-Validation for Nearest Neighbor Classifiers

Abstract

Cross-validation is an established technique for estimating the accuracy of a classifier and is normally performed either using a number of random test/train partitions of the data, or using kfold cross-validation. We present a technique for calculating the complete cross-validation for nearest-neighbor classifiers: i.e., averaging over all desired test/train partitions of data. This technique is applied to several common classifier variants such as K-nearest-neighbor, stratified data partitioning and arbitrary loss functions. We demonstrate, with complexity analysis and experimental timing results, that the technique can be performed in time comparable to k-fold cross-validation, though in effect it averages an exponential number of trials. We show that the results of complete cross-validation are biased equally compared to subsampling and kfold cross-validation, and there is some reduction in variance. This algorithm offers significant benefits both in terms of time and accuracy. 1.

Cite

Text

Mullin and Sukthankar. "Complete Cross-Validation for Nearest Neighbor Classifiers." International Conference on Machine Learning, 2000.

Markdown

[Mullin and Sukthankar. "Complete Cross-Validation for Nearest Neighbor Classifiers." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/mullin2000icml-complete/)

BibTeX

@inproceedings{mullin2000icml-complete,
  title     = {{Complete Cross-Validation for Nearest Neighbor Classifiers}},
  author    = {Mullin, Matthew D. and Sukthankar, Rahul},
  booktitle = {International Conference on Machine Learning},
  year      = {2000},
  pages     = {639-646},
  url       = {https://mlanthology.org/icml/2000/mullin2000icml-complete/}
}