Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning
Abstract
The residual gradient algorithm (RG), gradient descent of the Mean Squared Bellman Error, brings robust convergence guarantees to bootstrapped value estimation. Meanwhile, the far more common semi-gradient algorithm (SG) suffers from well-known instabilities and divergence. Unfortunately, RG often converges slowly in practice. Baird (1995) proposed residual algorithms (RA), weighted averaging of RG and SG, to combine RG’s robust convergence and SG’s speed. RA works moderately well in the online setting. We find, however, that RA works disproportionately well in the offline setting. Concretely, we find that merely adding a variable residual component to SAC gives state-of-the-art scores for about half of the D4RL gym tasks. We further show that using the minimum of ten critics lets our algorithm approximately match SAC-$N$'s state-of-the-art returns using 50$\times$ less compute. In contrast, TD3+BC with the same minimum-of-ten-critics trick does not match SAC-$N$'s returns on many environments. The only hyperparameter we tune is our residual weight — we leave all other hyperparameters unchanged from SAC-$N$.
Cite
Text
Snyder and Zhu. "Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning." NeurIPS 2022 Workshops: Offline_RL, 2022.Markdown
[Snyder and Zhu. "Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning." NeurIPS 2022 Workshops: Offline_RL, 2022.](https://mlanthology.org/neuripsw/2022/snyder2022neuripsw-raisin/)BibTeX
@inproceedings{snyder2022neuripsw-raisin,
title = {{Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning}},
author = {Snyder, Braham and Zhu, Yuke},
booktitle = {NeurIPS 2022 Workshops: Offline_RL},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/snyder2022neuripsw-raisin/}
}