Learning Models and Evaluating Policies with Offline Off-Policy Data Under Partial Observability
Abstract
Models in reinforcement learning are often estimated from offline data, which in many real-world scenarios is subject to partial observability. In this work, we study the challenges that emerge from using models estimated from partially-observable offline data for policy evaluation. Notably, a complete definition of the models includes dependence on the data-collecting policy. To address this issue, we introduce a method for model estimation that incorporates importance weighting in the model learning process. The off-policy samples are reweighted to be reflective of their probabilities under a different policy, such that the resultant model is a consistent estimator of the off-policy model and provides consistent estimates of the expected off-policy return. This is a crucial step towards the reliable and responsible use of models learned under partial observability, particularly in scenarios where inaccurate policy evaluation can have catastrophic consequences. We empirically demonstrate the efficacy of our method and its resilience to common approximations such as weight clipping on a range of domains with diverse types of partial observability.
Cite
Text
Chaudhari et al. "Learning Models and Evaluating Policies with Offline Off-Policy Data Under Partial Observability." NeurIPS 2023 Workshops: ReALML, 2023.Markdown
[Chaudhari et al. "Learning Models and Evaluating Policies with Offline Off-Policy Data Under Partial Observability." NeurIPS 2023 Workshops: ReALML, 2023.](https://mlanthology.org/neuripsw/2023/chaudhari2023neuripsw-learning/)BibTeX
@inproceedings{chaudhari2023neuripsw-learning,
title = {{Learning Models and Evaluating Policies with Offline Off-Policy Data Under Partial Observability}},
author = {Chaudhari, Shreyas and Thomas, Philip S. and da Silva, Bruno Castro},
booktitle = {NeurIPS 2023 Workshops: ReALML},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/chaudhari2023neuripsw-learning/}
}