Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Abstract

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. OffCEM applies importance weighting only to action clusters and addresses the residual causal effect through model-based reward estimation. We show that the proposed estimator is unbiased under a new assumption, called local correctness, which only requires that the residual-effect model preserves the relative expected reward differences of the actions within each cluster. To best leverage the CEM and local correctness, we also propose a new two-step procedure for performing model-based estimation that minimizes bias in the first step and variance in the second step. We find that the resulting OffCEM estimator substantially improves bias and variance compared to a range of conventional estimators. Experiments demonstrate that OffCEM provides substantial improvements in OPE especially in the presence of many actions.

Cite

Text

Saito et al. "Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling." International Conference on Machine Learning, 2023.

Markdown

[Saito et al. "Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/saito2023icml-offpolicy/)

BibTeX

@inproceedings{saito2023icml-offpolicy,
  title     = {{Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling}},
  author    = {Saito, Yuta and Ren, Qingyang and Joachims, Thorsten},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {29734-29759},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/saito2023icml-offpolicy/}
}