Off-Policy Reinforcement Learning with Model-Based Exploration Augmentation

Abstract

Exploration is crucial in Reinforcement Learning (RL) as it enables the agent to understand the environment for better decision-making. Existing exploration methods fall into two paradigms: active exploration, which injects stochasticity into the policy but struggles in high-dimensional environments, and passive exploration, which manages the replay buffer to prioritize under-explored regions but lacks sample diversity. To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences. MoGE consists of two components: (1) a diffusion generator for critical states under the guidance of entropy and TD error, and (2) a one-step imagination world model for constructing critical transitions for agent learning. Our method is simple to implement and seamlessly integrates with mainstream off-policy RL algorithms without structural modifications. Experiments on OpenAI Gym and DeepMind Control Suite demonstrate that MoGE, as an exploration augmentation, significantly enhances efficiency and performance in complex tasks.

Cite

Text

Wang et al. "Off-Policy Reinforcement Learning with Model-Based Exploration Augmentation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Wang et al. "Off-Policy Reinforcement Learning with Model-Based Exploration Augmentation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wang2025neurips-offpolicy/)

BibTeX

@inproceedings{wang2025neurips-offpolicy,
  title     = {{Off-Policy Reinforcement Learning with Model-Based Exploration Augmentation}},
  author    = {Wang, Likun and Zhang, Xiangteng and Wang, Yinuo and Zhan, Guojian and Wang, Wenxuan and Gao, Haoyu and Duan, Jingliang and Li, Shengbo Eben},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/wang2025neurips-offpolicy/}
}