Asymptotics of K-Fold Cross Validation
Abstract
This paper investigates the asymptotic distribution of the K-fold cross validation error in an i.i.d. setting. As the number of observations n goes to infinity while keeping the number of folds K fixed, the K-fold cross validation error is √ n-consistent for the expected out-of-sample error and has an asymptotically normal distribution. A consistent estimate of the asymptotic variance is derived and used to construct asymptotically valid confidence intervals for the expected out-of-sample error. A hypothesis test is developed for comparing two estimators’ expected out-of-sample errors and a subsampling procedure is used to obtain critical values. Monte Carlo simulations demonstrate the asymptotic validity of our confidence intervals for the expected out-of-sample error and investigate the size and power properties of our test. In our empirical application, we use our estimator selection test to compare the out-of-sample predictive performance of OLS, Neural Networks, and Random Forests for predicting the sale price of a domain name in a GoDaddy expiry auction.
Cite
Text
Li. "Asymptotics of K-Fold Cross Validation." Journal of Artificial Intelligence Research, 2023. doi:10.1613/JAIR.1.13974Markdown
[Li. "Asymptotics of K-Fold Cross Validation." Journal of Artificial Intelligence Research, 2023.](https://mlanthology.org/jair/2023/li2023jair-asymptotics/) doi:10.1613/JAIR.1.13974BibTeX
@article{li2023jair-asymptotics,
title = {{Asymptotics of K-Fold Cross Validation}},
author = {Li, Jessie},
journal = {Journal of Artificial Intelligence Research},
year = {2023},
pages = {491-526},
doi = {10.1613/JAIR.1.13974},
volume = {78},
url = {https://mlanthology.org/jair/2023/li2023jair-asymptotics/}
}