Doubly Robust Off-Policy Evaluation with Shrinkage
Abstract
We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.
Cite
Text
Su et al. "Doubly Robust Off-Policy Evaluation with Shrinkage." International Conference on Machine Learning, 2020.Markdown
[Su et al. "Doubly Robust Off-Policy Evaluation with Shrinkage." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/su2020icml-doubly/)BibTeX
@inproceedings{su2020icml-doubly,
title = {{Doubly Robust Off-Policy Evaluation with Shrinkage}},
author = {Su, Yi and Dimakopoulou, Maria and Krishnamurthy, Akshay and Dudik, Miroslav},
booktitle = {International Conference on Machine Learning},
year = {2020},
pages = {9167-9176},
volume = {119},
url = {https://mlanthology.org/icml/2020/su2020icml-doubly/}
}