Model-Based Offline Planning with Trajectory Pruning

Abstract

The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction. Unfortunately, existing offline RL methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. The model-based planning framework provides an attractive alternative. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.

Cite

Text

Zhan et al. "Model-Based Offline Planning with Trajectory Pruning." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/516

Markdown

[Zhan et al. "Model-Based Offline Planning with Trajectory Pruning." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/zhan2022ijcai-model/) doi:10.24963/IJCAI.2022/516

BibTeX

@inproceedings{zhan2022ijcai-model,
  title     = {{Model-Based Offline Planning with Trajectory Pruning}},
  author    = {Zhan, Xianyuan and Zhu, Xiangyu and Xu, Haoran},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {3716-3722},
  doi       = {10.24963/IJCAI.2022/516},
  url       = {https://mlanthology.org/ijcai/2022/zhan2022ijcai-model/}
}