Adaptive Trade-Offs in Off-Policy Learning
Abstract
A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives on existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.
Cite
Text
Rowland et al. "Adaptive Trade-Offs in Off-Policy Learning." Artificial Intelligence and Statistics, 2020.Markdown
[Rowland et al. "Adaptive Trade-Offs in Off-Policy Learning." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/rowland2020aistats-adaptive/)BibTeX
@inproceedings{rowland2020aistats-adaptive,
title = {{Adaptive Trade-Offs in Off-Policy Learning}},
author = {Rowland, Mark and Dabney, Will and Munos, Remi},
booktitle = {Artificial Intelligence and Statistics},
year = {2020},
pages = {34-44},
volume = {108},
url = {https://mlanthology.org/aistats/2020/rowland2020aistats-adaptive/}
}