Learning Bayes-Optimal Representation in Partially Observable Environments via Meta-Reinforcement Learning with Predictive Coding

Po-Chen Kuo, Han Hou, Will Dabney, Edgar Y. Walker

NeurIPSW 2024

/neuripsw/2024/kuo2024neuripsw-learning/

Abstract

Learning a compact representation summarizing history is essential for decision-making, planning, and generalization in partially observable environments. Memory-based meta-reinforcement learning (RL) has been shown to learn near Bayes-optimal policy under partial observability. However, its learned representations can fail to achieve equivalence to minimally-sufficient, Bayes-optimal belief states, potentially hindering its robustness and generalization. To overcome this challenge, we propose a meta-RL framework for learning an explicit belief representation by incorporating self-supervised predictive modules inspired by predictive coding from neuroscience literature. Our approach outperforms conventional meta-RL by generating more interpretable and task-relevant representations, which better capture the underlying task structure and dynamics. Using state machine simulation, we demonstrate the learned representations are more equivalent to Bayes-optimal states and linked to improved future prediction and policy learning. Our results suggest that self-supervised future prediction is a promising technique for enhancing representation learning in partially observable environments.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Kuo et al. "Learning Bayes-Optimal Representation in Partially Observable Environments via Meta-Reinforcement Learning with Predictive Coding." NeurIPS 2024 Workshops: NeuroAI, 2024.

Markdown

[Kuo et al. "Learning Bayes-Optimal Representation in Partially Observable Environments via Meta-Reinforcement Learning with Predictive Coding." NeurIPS 2024 Workshops: NeuroAI, 2024.](https://mlanthology.org/neuripsw/2024/kuo2024neuripsw-learning/)

BibTeX

@inproceedings{kuo2024neuripsw-learning,
  title     = {{Learning Bayes-Optimal Representation in Partially Observable Environments via Meta-Reinforcement Learning with Predictive Coding}},
  author    = {Kuo, Po-Chen and Hou, Han and Dabney, Will and Walker, Edgar Y.},
  booktitle = {NeurIPS 2024 Workshops: NeuroAI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/kuo2024neuripsw-learning/}
}