Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning
Abstract
Offline reinforcement learning learns from a static dataset without interacting with environments, which ensures security during the training process and thus has promising application prospects. However, directly applying naive reinforcement learning algorithms usually fails in an offline environment due to inaccurate estimation of the Q-value associated with out-of-distribution (OOD) state-action pairs by the Q-value approximation. It is an effective way to solve this problem by penalizing the Q-value of OOD state-action pairs. Among the methods of punishing OOD state-action pairs, count-based methods have demonstrated efficacy in discrete domains in a simple formulation. Inspired by it, a novel pseudo-count method for continuous environments called grid-mapping pseudo-count method (GPC) is proposed by extending the count-based method from discrete to continuous environments. Firstly, the continuous state and action space are mapped to discrete spaces using Grid-Mapping, and then the Q-values of OOD state-action pairs are constrained through pseudo-count. Secondly, a theoretical proof is provided to demonstrate that GPC can obtain appropriate uncertainty constraints. Thirdly, GPC is combined with soft actor-critic algorithm (SAC) to create a novel algorithm called GPC-SAC. Finally, experiments on D4RL datasets are conducted to show that GPC-SAC has state-of-the-art performance and lower computational cost than other algorithms that constrain the Q-value.
Cite
Text
Shen and Huang. "Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning." Machine Learning, 2025. doi:10.1007/S10994-025-06925-8Markdown
[Shen and Huang. "Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/shen2025mlj-gridmapping/) doi:10.1007/S10994-025-06925-8BibTeX
@article{shen2025mlj-gridmapping,
title = {{Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning}},
author = {Shen, Yi and Huang, Hanyan},
journal = {Machine Learning},
year = {2025},
pages = {277},
doi = {10.1007/S10994-025-06925-8},
volume = {114},
url = {https://mlanthology.org/mlj/2025/shen2025mlj-gridmapping/}
}