Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing

Abstract

In this paper we consider the problem of evaluating one digital marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary — an assumption that is typically false, particularly for digital marketing applications. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction methods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital marketing data set.

Cite

Text

Thomas et al. "Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.19104

Markdown

[Thomas et al. "Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/thomas2017aaai-predictive/) doi:10.1609/AAAI.V31I1.19104

BibTeX

@inproceedings{thomas2017aaai-predictive,
  title     = {{Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing}},
  author    = {Thomas, Philip S. and Theocharous, Georgios and Ghavamzadeh, Mohammad and Durugkar, Ishan and Brunskill, Emma},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {4740-4745},
  doi       = {10.1609/AAAI.V31I1.19104},
  url       = {https://mlanthology.org/aaai/2017/thomas2017aaai-predictive/}
}