Value-Aware Importance Weighting for Off-Policy Reinforcement Learning

Abstract

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to represent unbiased estimates of another distribution. However, importance sampling weights tend to be of high variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of value-aware importance weights which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.

Cite

Text

De Asis et al. "Value-Aware Importance Weighting for Off-Policy Reinforcement Learning." Proceedings of The 2nd Conference on Lifelong Learning Agents, 2023.

Markdown

[De Asis et al. "Value-Aware Importance Weighting for Off-Policy Reinforcement Learning." Proceedings of The 2nd Conference on Lifelong Learning Agents, 2023.](https://mlanthology.org/collas/2023/deasis2023collas-valueaware/)

BibTeX

@inproceedings{deasis2023collas-valueaware,
  title     = {{Value-Aware Importance Weighting for Off-Policy Reinforcement Learning}},
  author    = {De Asis, Kristopher and Graves, Eric and Sutton, Richard S.},
  booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents},
  year      = {2023},
  pages     = {745-763},
  volume    = {232},
  url       = {https://mlanthology.org/collas/2023/deasis2023collas-valueaware/}
}