GOPlan: Goal-Conditioned Offline Reinforcement Learning by Planning with Learned Models

Abstract

Offline Goal-Conditioned RL (GCRL) offers a feasible paradigm for learning general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods, mainly model-free, face constraints in handling limited data and generalizing to unseen goals. In this work, we propose Goal-conditioned Offline Planning (GOPlan), a novel model-based framework that contains two key phases: (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, we base the prior policy on an advantage-weighted conditioned generative adversarial network, which facilitates distinct mode separation, mitigating the pitfalls of out-of-distribution (OOD) actions. For further policy optimization, the reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals. With thorough experimental evaluations, we demonstrate that GOPlan achieves state-of-the-art performance on various offline multi-goal navigation and manipulation tasks. Moreover, our results highlight the superior ability of GOPlan to handle small data budgets and generalize to OOD goals.

Cite

Text

Wang et al. "GOPlan: Goal-Conditioned Offline Reinforcement Learning by Planning with Learned Models." Transactions on Machine Learning Research, 2024.

Markdown

[Wang et al. "GOPlan: Goal-Conditioned Offline Reinforcement Learning by Planning with Learned Models." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/wang2024tmlr-goplan/)

BibTeX

@article{wang2024tmlr-goplan,
  title     = {{GOPlan: Goal-Conditioned Offline Reinforcement Learning by Planning with Learned Models}},
  author    = {Wang, Mianchu and Yang, Rui and Chen, Xi and Sun, Hao and Fang, Meng and Montana, Giovanni},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/wang2024tmlr-goplan/}
}