Asymptotics of K-Fold Cross Validation

Abstract

This paper investigates the asymptotic distribution of the K-fold cross validation error in an i.i.d. setting. As the number of observations n goes to infinity while keeping the number of folds K fixed, the K-fold cross validation error is √ n-consistent for the expected out-of-sample error and has an asymptotically normal distribution. A consistent estimate of the asymptotic variance is derived and used to construct asymptotically valid confidence intervals for the expected out-of-sample error. A hypothesis test is developed for comparing two estimators’ expected out-of-sample errors and a subsampling procedure is used to obtain critical values. Monte Carlo simulations demonstrate the asymptotic validity of our confidence intervals for the expected out-of-sample error and investigate the size and power properties of our test. In our empirical application, we use our estimator selection test to compare the out-of-sample predictive performance of OLS, Neural Networks, and Random Forests for predicting the sale price of a domain name in a GoDaddy expiry auction.

Cite

Text

Li. "Asymptotics of K-Fold Cross Validation." Journal of Artificial Intelligence Research, 2023. doi:10.1613/JAIR.1.13974

Markdown

[Li. "Asymptotics of K-Fold Cross Validation." Journal of Artificial Intelligence Research, 2023.](https://mlanthology.org/jair/2023/li2023jair-asymptotics/) doi:10.1613/JAIR.1.13974

BibTeX

@article{li2023jair-asymptotics,
  title     = {{Asymptotics of K-Fold Cross Validation}},
  author    = {Li, Jessie},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2023},
  pages     = {491-526},
  doi       = {10.1613/JAIR.1.13974},
  volume    = {78},
  url       = {https://mlanthology.org/jair/2023/li2023jair-asymptotics/}
}