Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Abstract

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

Cite

Text

Wu et al. "Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning." International Conference on Machine Learning, 2021.

Markdown

[Wu et al. "Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/wu2021icml-uncertainty/)

BibTeX

@inproceedings{wu2021icml-uncertainty,
  title     = {{Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning}},
  author    = {Wu, Yue and Zhai, Shuangfei and Srivastava, Nitish and Susskind, Joshua M and Zhang, Jian and Salakhutdinov, Ruslan and Goh, Hanlin},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {11319-11328},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/wu2021icml-uncertainty/}
}