CHPO: Constrained Hybrid-Action Policy Optimization for Reinforcement Learning

Abstract

Constrained hybrid-action reinforcement learning (RL) promises to learn a safe policy within a parameterized action space, which is particularly valuable for safety-critical applications involving discrete-continuous hybrid action spaces. However, existing hybrid-action RL algorithms primarily focus on reward maximization, which faces significant challenges for tasks involving both cost constraints and hybrid action spaces. In this work, we propose a novel Constrained Hybrid-action Policy Optimization algorithm (CHPO) to address the problems of constrained hybrid-action RL. Concretely, we rethink the limitations of hybrid-action RL in handling safe tasks with parameterized action spaces and reframe the objective of constrained hybrid-action RL by introducing the concept of Constrained Parameterized-action Markov Decision Process (CPMDP). Subsequently, we present a constrained hybrid-action policy optimization algorithm to confront the constrained hybrid-action problems and conduct theoretical analyses demonstrating that the CHPO converges to the optimal solution while satisfying safety constraints. Finally, extensive experiments demonstrate that the CHPO achieves competitive performance across multiple experimental tasks.

Cite

Text

Zhou et al. "CHPO: Constrained Hybrid-Action Policy Optimization for Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhou et al. "CHPO: Constrained Hybrid-Action Policy Optimization for Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhou2025neurips-chpo/)

BibTeX

@inproceedings{zhou2025neurips-chpo,
  title     = {{CHPO: Constrained Hybrid-Action Policy Optimization for Reinforcement Learning}},
  author    = {Zhou, Ao and Guan, Jiayi and Shen, Li and Lu, Fan and Qu, Sanqing and Zhao, Junqiao and Wang, Ziqiao and Wu, Ya and Chen, Guang},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhou2025neurips-chpo/}
}