Adaptive Trade-Offs in Off-Policy Learning

Abstract

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives on existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.

Cite

Text

Rowland et al. "Adaptive Trade-Offs in Off-Policy Learning." Artificial Intelligence and Statistics, 2020.

Markdown

[Rowland et al. "Adaptive Trade-Offs in Off-Policy Learning." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/rowland2020aistats-adaptive/)

BibTeX

@inproceedings{rowland2020aistats-adaptive,
  title     = {{Adaptive Trade-Offs in Off-Policy Learning}},
  author    = {Rowland, Mark and Dabney, Will and Munos, Remi},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {34-44},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/rowland2020aistats-adaptive/}
}