Experience Replay with Likelihood-Free Importance Weights
Abstract
The use of past experiences to accelerate temporal difference (TD) learning of value functions, or experience replay, is a key component in deep reinforcement learning methods such as actor-critic.In this work, we propose to re-weight experiences based on their likelihood under the stationary distribution of the current policy, and justify this with a contraction argument over the Bellman evaluation operator. The resulting TD objective encourages small approximation errors on the value function over frequently encountered states. To balance bias (from off-policy experiences) and variance (from on-policy experiences), we use a likelihood-free density ratio estimator between on-policy and off-policy experiences, and use the learned ratios as the prioritization weights. We apply the proposed approach empirically on Soft Actor Critic (SAC), Double DQN and Data-regularized Q(DrQ), over 12 Atari environments and 6 tasks from the DeepMind control suite. We achieve superior sample complexity on 9 out of 12 Atari environments and 16 out of 24 method-task combinations for DCS compared to the best baselines.
Cite
Text
Sinha et al. "Experience Replay with Likelihood-Free Importance Weights." Proceedings of The 4th Annual Learning for Dynamics and Control Conference, 2022.Markdown
[Sinha et al. "Experience Replay with Likelihood-Free Importance Weights." Proceedings of The 4th Annual Learning for Dynamics and Control Conference, 2022.](https://mlanthology.org/l4dc/2022/sinha2022l4dc-experience/)BibTeX
@inproceedings{sinha2022l4dc-experience,
title = {{Experience Replay with Likelihood-Free Importance Weights}},
author = {Sinha, Samarth and Song, Jiaming and Garg, Animesh and Ermon, Stefano},
booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference},
year = {2022},
pages = {110-123},
volume = {168},
url = {https://mlanthology.org/l4dc/2022/sinha2022l4dc-experience/}
}