In-Sample Actor Critic for Offline Reinforcement Learning

Abstract

Offline reinforcement learning suffers from out-of-distribution issue and extrapolation error. Most methods penalize the out-of-distribution state-action pairs or regularize the trained policy towards the behavior policy but cannot guarantee to get rid of extrapolation error. We propose In-sample Actor Critic (IAC) which utilizes sampling-importance resampling to execute in-sample policy evaluation. IAC only uses the target Q-values of the actions in the dataset to evaluate the trained policy, thus avoiding extrapolation error. The proposed method performs unbiased policy evaluation and has a lower variance than importance sampling in many cases. Empirical results show that IAC obtains competitive performance compared to the state-of-the-art methods on Gym-MuJoCo locomotion domains and much more challenging AntMaze domains.

Cite

Text

Zhang et al. "In-Sample Actor Critic for Offline Reinforcement Learning." International Conference on Learning Representations, 2023.

Markdown

[Zhang et al. "In-Sample Actor Critic for Offline Reinforcement Learning." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/zhang2023iclr-insample/)

BibTeX

@inproceedings{zhang2023iclr-insample,
  title     = {{In-Sample Actor Critic for Offline Reinforcement Learning}},
  author    = {Zhang, Hongchang and Mao, Yixiu and Wang, Boyuan and He, Shuncheng and Xu, Yi and Ji, Xiangyang},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/zhang2023iclr-insample/}
}