Weighted Policy Constraints for Offline Reinforcement Learning

Peng, Zhiyong; Han, Changlin; Liu, Yadong; Zhou, Zongtan

doi:10.1609/AAAI.V37I8.26130

Weighted Policy Constraints for Offline Reinforcement Learning

Zhiyong Peng, Changlin Han, Yadong Liu, Zongtan Zhou

AAAI 2023 pp. 9435-9443

doi:10.1609/AAAI.V37I8.26130 /aaai/2023/peng2023aaai-weighted/

Abstract

Offline reinforcement learning (RL) aims to learn policy from the passively collected offline dataset. Applying existing RL methods on the static dataset straightforwardly will raise distribution shift, causing these unconstrained RL methods to fail. To cope with the distribution shift problem, a common practice in offline RL is to constrain the policy explicitly or implicitly close to behavioral policy. However, the available dataset usually contains sub-optimal or inferior actions, constraining the policy near all these actions will make the policy inevitably learn inferior behaviors, limiting the performance of the algorithm. Based on this observation, we propose a weighted policy constraints (wPC) method that only constrains the learned policy to desirable behaviors, making room for policy improvement on other parts. Our algorithm outperforms existing state-of-the-art offline RL algorithms on the D4RL offline gym datasets. Moreover, the proposed algorithm is simple to implement with few hyper-parameters, making the proposed wPC algorithm a robust offline RL method with low computational complexity.

PDF AAAI Semantic Scholar

Cite

Text

Peng et al. "Weighted Policy Constraints for Offline Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I8.26130

Markdown

[Peng et al. "Weighted Policy Constraints for Offline Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/peng2023aaai-weighted/) doi:10.1609/AAAI.V37I8.26130

BibTeX

@inproceedings{peng2023aaai-weighted,
  title     = {{Weighted Policy Constraints for Offline Reinforcement Learning}},
  author    = {Peng, Zhiyong and Han, Changlin and Liu, Yadong and Zhou, Zongtan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {9435-9443},
  doi       = {10.1609/AAAI.V37I8.26130},
  url       = {https://mlanthology.org/aaai/2023/peng2023aaai-weighted/}
}