Deeply-Debiased Off-Policy Interval Estimation

Abstract

Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.

Cite

Text

Shi et al. "Deeply-Debiased Off-Policy Interval Estimation." International Conference on Machine Learning, 2021.

Markdown

[Shi et al. "Deeply-Debiased Off-Policy Interval Estimation." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/shi2021icml-deeplydebiased/)

BibTeX

@inproceedings{shi2021icml-deeplydebiased,
  title     = {{Deeply-Debiased Off-Policy Interval Estimation}},
  author    = {Shi, Chengchun and Wan, Runzhe and Chernozhukov, Victor and Song, Rui},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {9580-9591},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/shi2021icml-deeplydebiased/}
}