Deeply-Debiased Off-Policy Interval Estimation
Abstract
Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.
Cite
Text
Shi et al. "Deeply-Debiased Off-Policy Interval Estimation." International Conference on Machine Learning, 2021.Markdown
[Shi et al. "Deeply-Debiased Off-Policy Interval Estimation." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/shi2021icml-deeplydebiased/)BibTeX
@inproceedings{shi2021icml-deeplydebiased,
title = {{Deeply-Debiased Off-Policy Interval Estimation}},
author = {Shi, Chengchun and Wan, Runzhe and Chernozhukov, Victor and Song, Rui},
booktitle = {International Conference on Machine Learning},
year = {2021},
pages = {9580-9591},
volume = {139},
url = {https://mlanthology.org/icml/2021/shi2021icml-deeplydebiased/}
}