Doubly Robust Off-Policy Evaluation with Shrinkage

Abstract

We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.

Cite

Text

Su et al. "Doubly Robust Off-Policy Evaluation with Shrinkage." International Conference on Machine Learning, 2020.

Markdown

[Su et al. "Doubly Robust Off-Policy Evaluation with Shrinkage." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/su2020icml-doubly/)

BibTeX

@inproceedings{su2020icml-doubly,
  title     = {{Doubly Robust Off-Policy Evaluation with Shrinkage}},
  author    = {Su, Yi and Dimakopoulou, Maria and Krishnamurthy, Akshay and Dudik, Miroslav},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {9167-9176},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/su2020icml-doubly/}
}