Truncated Gaussian Policy for Debiased Continuous Control

Abstract

In continuous domains, reinforcement learning policies are often based on Gaussian distributions for their generality. However, the unbounded support of Gaussian policy can cause a bias toward sampling boundary actions in many continuous control tasks that impose action limits due to physical constraints. This "boundary action bias'' can negatively impact training in algorithms like Proximal Policy Optimization. Despite this, it has been overlooked in many existing research and applications. In this paper, we revisit this issue by presenting illustrative explanations and analysis from the sampling point of view. Then, we introduce a truncated Gaussian policy with inherent bounds as a minimal alternative to mitigate the bias. However, we find that the plain truncated Gaussian policy may lay the counter-bias, preferring interior actions: to balance the bias, we ultimately propose a scale-adjusted truncated Gaussian policy, where the distribution scale shrinks if the location is near the boundaries. This property makes boundary actions deterministic more than in plain truncated Gaussian, but still less than in original Gaussian. Extensive empirical studies and comparisons on various continuous control tasks demonstrate that the truncated Gaussian policies significantly reduce the rate of boundary action usage, while scale-adjusted ones successfully balance the bias and counter-bias. It generally outperforms the Gaussian policy and shows competitive results compared to other approaches designed to counteract the bias.

Cite

Text

Lee et al. "Truncated Gaussian Policy for Debiased Continuous Control." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I17.33988

Markdown

[Lee et al. "Truncated Gaussian Policy for Debiased Continuous Control." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/lee2025aaai-truncated/) doi:10.1609/AAAI.V39I17.33988

BibTeX

@inproceedings{lee2025aaai-truncated,
  title     = {{Truncated Gaussian Policy for Debiased Continuous Control}},
  author    = {Lee, Ganghun and Kim, Minji and Lee, Minsu and Zhang, Byoung-Tak},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {18071-18081},
  doi       = {10.1609/AAAI.V39I17.33988},
  url       = {https://mlanthology.org/aaai/2025/lee2025aaai-truncated/}
}