Cross-Validated Off-Policy Evaluation

Abstract

We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.

Cite

Text

Cief et al. "Cross-Validated Off-Policy Evaluation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I15.33765

Markdown

[Cief et al. "Cross-Validated Off-Policy Evaluation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/cief2025aaai-cross/) doi:10.1609/AAAI.V39I15.33765

BibTeX

@inproceedings{cief2025aaai-cross,
  title     = {{Cross-Validated Off-Policy Evaluation}},
  author    = {Cief, Matej and Kveton, Branislav and Kompan, Michal},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {16073-16081},
  doi       = {10.1609/AAAI.V39I15.33765},
  url       = {https://mlanthology.org/aaai/2025/cief2025aaai-cross/}
}