Constrained Decision Transformer for Offline Safe Reinforcement Learning
Abstract
Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.
Cite
Text
Liu et al. "Constrained Decision Transformer for Offline Safe Reinforcement Learning." International Conference on Machine Learning, 2023.Markdown
[Liu et al. "Constrained Decision Transformer for Offline Safe Reinforcement Learning." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/liu2023icml-constrained/)BibTeX
@inproceedings{liu2023icml-constrained,
title = {{Constrained Decision Transformer for Offline Safe Reinforcement Learning}},
author = {Liu, Zuxin and Guo, Zijian and Yao, Yihang and Cen, Zhepeng and Yu, Wenhao and Zhang, Tingnan and Zhao, Ding},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {21611-21630},
volume = {202},
url = {https://mlanthology.org/icml/2023/liu2023icml-constrained/}
}