On the Bias of K-Fold Cross Validation with Stable Learners

Abstract

This paper investigates the efficiency of the K-fold cross-validation (CV) procedure and a debiased version thereof as a means of estimating the generalization risk of a learning algorithm. We work under the general assumption of uniform algorithmic stability. We show that the K-fold risk estimate may not be consistent under such general stability assumptions, by constructing non vanishing lower bounds on the error in realistic contexts such as regularized empirical risk minimisation and stochastic gradient descent. We thus advocate the use of a debiased version of the K-fold and prove an error bound with exponential tail decay regarding this version. Our result is applicable to the large class of uniformly stable algorithms, contrarily to earlier works focusing on specific tasks such as density estimation. We illustrate the relevance of the debiased K-fold CV on a simple model selection problem and demonstrate empirically the usefulness of the promoted approach on real world classification and regression datasets.

Cite

Text

Aghbalou et al. "On the Bias of K-Fold Cross Validation with Stable Learners." Artificial Intelligence and Statistics, 2023.

Markdown

[Aghbalou et al. "On the Bias of K-Fold Cross Validation with Stable Learners." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/aghbalou2023aistats-bias/)

BibTeX

@inproceedings{aghbalou2023aistats-bias,
  title     = {{On the Bias of K-Fold Cross Validation with Stable Learners}},
  author    = {Aghbalou, Anass and Sabourin, Anne and Portier, François},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {3775-3794},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/aghbalou2023aistats-bias/}
}