Learning Belief Representations for Partially Observable Deep RL
Abstract
Many important real-world Reinforcement Learning (RL) problems involve partial observability and require policies with memory. Unfortunately, standard deep RL algorithms for partially observable settings typically condition on the full history of interactions and are notoriously difficult to train. We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. Our approach simplifies policy learning by leveraging state information at training time, that may not be available at deployment time. We do so in two ways: first, we decouple belief state modelling (via unsupervised learning) from policy optimization (via RL); and second, we propose a representation learning approach to capture a compact set of reward-relevant features of the state. Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.
Cite
Text
Wang et al. "Learning Belief Representations for Partially Observable Deep RL." International Conference on Machine Learning, 2023.Markdown
[Wang et al. "Learning Belief Representations for Partially Observable Deep RL." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/wang2023icml-learning-a/)BibTeX
@inproceedings{wang2023icml-learning-a,
title = {{Learning Belief Representations for Partially Observable Deep RL}},
author = {Wang, Andrew and Li, Andrew C and Klassen, Toryn Q. and Icarte, Rodrigo Toro and Mcilraith, Sheila A.},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {35970-35988},
volume = {202},
url = {https://mlanthology.org/icml/2023/wang2023icml-learning-a/}
}