Conditional Importance Sampling for Off-Policy Learning
Abstract
The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
Cite
Text
Rowland et al. "Conditional Importance Sampling for Off-Policy Learning." Artificial Intelligence and Statistics, 2020.Markdown
[Rowland et al. "Conditional Importance Sampling for Off-Policy Learning." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/rowland2020aistats-conditional/)BibTeX
@inproceedings{rowland2020aistats-conditional,
title = {{Conditional Importance Sampling for Off-Policy Learning}},
author = {Rowland, Mark and Harutyunyan, Anna and Hasselt, Hado and Borsa, Diana and Schaul, Tom and Munos, Remi and Dabney, Will},
booktitle = {Artificial Intelligence and Statistics},
year = {2020},
pages = {45-55},
volume = {108},
url = {https://mlanthology.org/aistats/2020/rowland2020aistats-conditional/}
}