Deep Residual Reinforcement Learning (Extended Abstract)

Abstract

We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in commonly used benchmarks. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD(k) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.

Cite

Text

Zhang et al. "Deep Residual Reinforcement Learning (Extended Abstract)." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/668

Markdown

[Zhang et al. "Deep Residual Reinforcement Learning (Extended Abstract)." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/zhang2021ijcai-deep/) doi:10.24963/IJCAI.2021/668

BibTeX

@inproceedings{zhang2021ijcai-deep,
  title     = {{Deep Residual Reinforcement Learning (Extended Abstract)}},
  author    = {Zhang, Shangtong and Boehmer, Wendelin and Whiteson, Shimon},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {4869-4873},
  doi       = {10.24963/IJCAI.2021/668},
  url       = {https://mlanthology.org/ijcai/2021/zhang2021ijcai-deep/}
}