CLaP: Conditional Latent Planners for Offline Reinforcement Learning
Abstract
Recent work has formulated offline reinforcement learning (RL) as a sequence modeling problem, benefiting from the simplicity and scalability of the Transformer architecture. However, sequence models struggle to model trajectories that are long-horizon or involve complicated environment dynamics. We propose CLaP (Conditional Latent Planners) to learn a simple goal-conditioned latent space from offline agent behavior, and incrementally decode good actions from a latent plan. We evaluate our method on continuous control domains from the D4RL benchmark. Compared to non-sequential models and return-conditioned sequential models, CLaP shows competitive if not better performance across continuous control tasks. It particularly does better in environments with complex transition dynamics with up to $+149.8\%$ performance increase. Our results suggest that decision-making is easier with simplified latent dynamics that models behavior as being goal-conditioned.
Cite
Text
Shin and Wang. "CLaP: Conditional Latent Planners for Offline Reinforcement Learning." NeurIPS 2022 Workshops: FMDM, 2022.Markdown
[Shin and Wang. "CLaP: Conditional Latent Planners for Offline Reinforcement Learning." NeurIPS 2022 Workshops: FMDM, 2022.](https://mlanthology.org/neuripsw/2022/shin2022neuripsw-clap/)BibTeX
@inproceedings{shin2022neuripsw-clap,
title = {{CLaP: Conditional Latent Planners for Offline Reinforcement Learning}},
author = {Shin, Harry Donghyeop and Wang, Rose E},
booktitle = {NeurIPS 2022 Workshops: FMDM},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/shin2022neuripsw-clap/}
}