Transferable Reinforcement Learning via Generalized Occupancy Models

Abstract

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new tasks to linear reward regression. Yet, policy improvement with successor features can be challenging. This work proposes a novel class of models, i.e., generalized occupancy models (GOMs), that learn a distribution of successor features from a stationary dataset, along with a policy that acts to realize different successor features. These models can quickly select the optimal action for arbitrary new tasks. By directly modeling long-term outcomes in the dataset, GOMs avoid compounding error while enabling rapid transfer across reward functions. We present a practical instantiation of GOMs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.

Cite

Text

Zhu et al. "Transferable Reinforcement Learning via Generalized Occupancy Models." ICML 2024 Workshops: ARLET, 2024.

Markdown

[Zhu et al. "Transferable Reinforcement Learning via Generalized Occupancy Models." ICML 2024 Workshops: ARLET, 2024.](https://mlanthology.org/icmlw/2024/zhu2024icmlw-transferable/)

BibTeX

@inproceedings{zhu2024icmlw-transferable,
  title     = {{Transferable Reinforcement Learning via Generalized Occupancy Models}},
  author    = {Zhu, Chuning and Wang, Xinqi and Han, Tyler and Du, Simon Shaolei and Gupta, Abhishek},
  booktitle = {ICML 2024 Workshops: ARLET},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/zhu2024icmlw-transferable/}
}