Estimation of Prediction Error with Known Covariate Shift

Abstract

In supervised learning, the estimation of prediction error on unlabeled test data is an important task. Existing methods are usually built on the assumption that the training and test data are sampled from the same distribution, which is often violated in practice. As a result, traditional estimators like cross-validation (CV) will be biased and this may result in poor model selection. In this paper, we assume that we have a test dataset in which the feature values are available but not the outcome labels, and focus on a particular form of distributional shift of covariate shift. We propose an alternative method based on parametric bootstrap of the target of conditional error ErrX. Empirically our method outperforms CV for both simulation and real data example across different modeling tasks, and is comparable to state-of-the-art methods for image classification.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Xu and Tibshirani. "Estimation of Prediction Error with Known Covariate Shift." NeurIPS 2022 Workshops: DistShift, 2022.

Markdown

[Xu and Tibshirani. "Estimation of Prediction Error with Known Covariate Shift." NeurIPS 2022 Workshops: DistShift, 2022.](https://mlanthology.org/neuripsw/2022/xu2022neuripsw-estimation/)

BibTeX

@inproceedings{xu2022neuripsw-estimation,
  title     = {{Estimation of Prediction Error with Known Covariate Shift}},
  author    = {Xu, Hui and Tibshirani, Robert},
  booktitle = {NeurIPS 2022 Workshops: DistShift},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/xu2022neuripsw-estimation/}
}