Boundary-to-Region Supervision for Offline Safe Reinforcement Learning
Abstract
Offline safe reinforcement learning aims to learn policies that satisfy predefined safety constraints from static datasets. Existing sequence-model-based methods condition action generation on symmetric input tokens for return-to-go and cost-to-go, neglecting their intrinsic asymmetry: RTG serves as a flexible performance target, while CTG should represent a rigid safety boundary. This symmetric conditioning leads to unreliable constraint satisfaction, especially when encountering out-of-distribution cost trajectories. To address this, we propose Boundary-to-Region (B2R), a framework that enables asymmetric conditioning through cost signal realignment . B2R redefines CTG as a boundary constraint under a fixed safety budget, unifying the cost distribution of all feasible trajectories while preserving reward structures. Combined with rotary positional embeddings , it enhances exploration within the safe region. Experimental results show that B2R satisfies safety constraints in 35 out of 38 safety-critical tasks while achieving superior reward performance over baseline methods. This work highlights the limitations of symmetric token conditioning and establishes a new theoretical and practical approach for applying sequence models to safe RL.
Cite
Text
Su et al. "Boundary-to-Region Supervision for Offline Safe Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.Markdown
[Su et al. "Boundary-to-Region Supervision for Offline Safe Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/su2025neurips-boundarytoregion/)BibTeX
@inproceedings{su2025neurips-boundarytoregion,
title = {{Boundary-to-Region Supervision for Offline Safe Reinforcement Learning}},
author = {Su, Huikang and Peng, Dengyun and Zhuang, Zifeng and Liu, YuHan and Chen, Qiguang and Wang, Donglin and Liu, Qinghe},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/su2025neurips-boundarytoregion/}
}