Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Abstract

As a marriage between offline RL and meta-RL, the advent of offline meta-reinforcement learning (OMRL) has shown great promise in enabling RL agents to multi-task and quickly adapt while acquiring knowledge safely. Among which, context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations. In this work, by examining several key milestones in the field of COMRL, we propose to integrate these seemingly independent methodologies into a unified framework. Most importantly, we show that the pre-existing COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds. Such theoretical insight offers ample design freedom for novel algorithms. As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures. This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning. Given itsgenerality, we envision our framework as a promising offline pre-training paradigm of foundation models for decision making.

Cite

Text

Li et al. "Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning." Neural Information Processing Systems, 2024. doi:10.52202/079017-2408

Markdown

[Li et al. "Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/li2024neurips-information/) doi:10.52202/079017-2408

BibTeX

@inproceedings{li2024neurips-information,
  title     = {{Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning}},
  author    = {Li, Lanqing and Zhang, Hai and Zhang, Xinyu and Zhu, Shatong and Yu, Yang and Zhao, Junqiao and Heng, Pheng-Ann},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2408},
  url       = {https://mlanthology.org/neurips/2024/li2024neurips-information/}
}