Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing
Abstract
In this paper we consider the problem of evaluating one digital marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary — an assumption that is typically false, particularly for digital marketing applications. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction methods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital marketing data set.
Cite
Text
Thomas et al. "Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.19104Markdown
[Thomas et al. "Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/thomas2017aaai-predictive/) doi:10.1609/AAAI.V31I1.19104BibTeX
@inproceedings{thomas2017aaai-predictive,
title = {{Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing}},
author = {Thomas, Philip S. and Theocharous, Georgios and Ghavamzadeh, Mohammad and Durugkar, Ishan and Brunskill, Emma},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2017},
pages = {4740-4745},
doi = {10.1609/AAAI.V31I1.19104},
url = {https://mlanthology.org/aaai/2017/thomas2017aaai-predictive/}
}