Weighted Policy Constraints for Offline Reinforcement Learning
Abstract
Offline reinforcement learning (RL) aims to learn policy from the passively collected offline dataset. Applying existing RL methods on the static dataset straightforwardly will raise distribution shift, causing these unconstrained RL methods to fail. To cope with the distribution shift problem, a common practice in offline RL is to constrain the policy explicitly or implicitly close to behavioral policy. However, the available dataset usually contains sub-optimal or inferior actions, constraining the policy near all these actions will make the policy inevitably learn inferior behaviors, limiting the performance of the algorithm. Based on this observation, we propose a weighted policy constraints (wPC) method that only constrains the learned policy to desirable behaviors, making room for policy improvement on other parts. Our algorithm outperforms existing state-of-the-art offline RL algorithms on the D4RL offline gym datasets. Moreover, the proposed algorithm is simple to implement with few hyper-parameters, making the proposed wPC algorithm a robust offline RL method with low computational complexity.
Cite
Text
Peng et al. "Weighted Policy Constraints for Offline Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I8.26130Markdown
[Peng et al. "Weighted Policy Constraints for Offline Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/peng2023aaai-weighted/) doi:10.1609/AAAI.V37I8.26130BibTeX
@inproceedings{peng2023aaai-weighted,
title = {{Weighted Policy Constraints for Offline Reinforcement Learning}},
author = {Peng, Zhiyong and Han, Changlin and Liu, Yadong and Zhou, Zongtan},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2023},
pages = {9435-9443},
doi = {10.1609/AAAI.V37I8.26130},
url = {https://mlanthology.org/aaai/2023/peng2023aaai-weighted/}
}