Marginalized Operators for Off-Policy Reinforcement Learning
Abstract
In this work, we propose marginalized operators, a new class of off-policy evaluation operators for reinforcement learning. Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases. Marginalized operators also suggest a form of sample-based estimates with potential variance reduction, compared to sample-based estimates of the original multi-step operators. We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases. Finally, we empirically demonstrate that marginalized operators provide performance gains to off-policy evaluation problems and downstream policy optimization algorithms.
Cite
Text
Tang et al. "Marginalized Operators for Off-Policy Reinforcement Learning." Artificial Intelligence and Statistics, 2022.Markdown
[Tang et al. "Marginalized Operators for Off-Policy Reinforcement Learning." Artificial Intelligence and Statistics, 2022.](https://mlanthology.org/aistats/2022/tang2022aistats-marginalized/)BibTeX
@inproceedings{tang2022aistats-marginalized,
title = {{Marginalized Operators for Off-Policy Reinforcement Learning}},
author = {Tang, Yunhao and Rowland, Mark and Munos, Remi and Valko, Michal},
booktitle = {Artificial Intelligence and Statistics},
year = {2022},
pages = {655-679},
volume = {151},
url = {https://mlanthology.org/aistats/2022/tang2022aistats-marginalized/}
}