Off-Policy Evaluation with Policy-Dependent Optimization Response
Abstract
The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an average of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an average but rather as an output of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with policy-dependent linear optimization responses: causal outcomes introduce stochasticity in objective function coefficients. Under this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of optimization bias even for the case of policy evaluation. We construct unbiased estimators for the policy-dependent estimand by a perturbation method, and discuss asymptotic variance properties for a set of adjusted plug-in estimators. Lastly, attaining unbiased policy evaluation allows for policy optimization: we provide a general algorithm for optimizing causal interventions. We corroborate our theoretical results with numerical simulations.
Cite
Text
Guo et al. "Off-Policy Evaluation with Policy-Dependent Optimization Response." Neural Information Processing Systems, 2022.Markdown
[Guo et al. "Off-Policy Evaluation with Policy-Dependent Optimization Response." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/guo2022neurips-offpolicy/)BibTeX
@inproceedings{guo2022neurips-offpolicy,
title = {{Off-Policy Evaluation with Policy-Dependent Optimization Response}},
author = {Guo, Wenshuo and Jordan, Michael I. and Zhou, Angela},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/guo2022neurips-offpolicy/}
}