Conditional Importance Sampling for Off-Policy Learning

Abstract

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Cite

Text

Rowland et al. "Conditional Importance Sampling for Off-Policy Learning." Artificial Intelligence and Statistics, 2020.

Markdown

[Rowland et al. "Conditional Importance Sampling for Off-Policy Learning." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/rowland2020aistats-conditional/)

BibTeX

@inproceedings{rowland2020aistats-conditional,
  title     = {{Conditional Importance Sampling for Off-Policy Learning}},
  author    = {Rowland, Mark and Harutyunyan, Anna and Hasselt, Hado and Borsa, Diana and Schaul, Tom and Munos, Remi and Dabney, Will},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {45-55},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/rowland2020aistats-conditional/}
}