Theoretical Analysis of Cross-Validation for Estimating the Risk of the $k$-Nearest Neighbor Classifier

Abstract

The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the $k$-nearest neighbors ($k$NN) rule in the context of binary classification. Here we focus on the leave-$p$-out cross-validation (L$p$O) used to assess the performance of the $k$NN classifier. Remarkably this L$p$O estimator can be efficiently computed in this context using closed-form formulas derived by Celisse and Mary-Huard (2011). We describe a general strategy to derive moment and exponential concentration inequalities for the L$p$O estimator applied to the $k$NN classifier. Such results are obtained first by exploiting the connection between the L$p$O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L$1$O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L$p$O estimator and the classification error/risk of the $k$NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments.

Cite

Text

Celisse and Mary-Huard. "Theoretical Analysis of Cross-Validation for Estimating the Risk of the $k$-Nearest Neighbor Classifier." Journal of Machine Learning Research, 2018.

Markdown

[Celisse and Mary-Huard. "Theoretical Analysis of Cross-Validation for Estimating the Risk of the $k$-Nearest Neighbor Classifier." Journal of Machine Learning Research, 2018.](https://mlanthology.org/jmlr/2018/celisse2018jmlr-theoretical/)

BibTeX

@article{celisse2018jmlr-theoretical,
  title     = {{Theoretical Analysis of Cross-Validation for Estimating the Risk of the $k$-Nearest Neighbor Classifier}},
  author    = {Celisse, Alain and Mary-Huard, Tristan},
  journal   = {Journal of Machine Learning Research},
  year      = {2018},
  pages     = {1-54},
  volume    = {19},
  url       = {https://mlanthology.org/jmlr/2018/celisse2018jmlr-theoretical/}
}