Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL

Abstract

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent learns from a fixed dataset—a common requirement in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction. This approach leverages high-quality offline data while incorporating safety requirements. Empirical results show that DRCORL achieves reliable safety performance, fast inference, and strong reward outcomes across robot learning tasks. Compared to existing safe offline RL methods, it consistently meets cost limits and performs well with the same hyperparameters, indicating practical applicability in real-world scenarios. We open-source our implementation at https://github.com/JamesJunyuGuo/DRCORL.

Cite

Text

Guo et al. "Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL." Advances in Neural Information Processing Systems, 2025.

Markdown

[Guo et al. "Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/guo2025neurips-dont/)

BibTeX

@inproceedings{guo2025neurips-dont,
  title     = {{Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL}},
  author    = {Guo, Junyu and Zheng, Zhi and Ying, Donghao and Jin, Ming and Gu, Shangding and Spanos, Costas and Lavaei, Javad},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/guo2025neurips-dont/}
}