Beyond Target Networks: Improving Deep $q$-Learning with Functional Regularization

Abstract

A majority of recent successes in deep Reinforcement Learning are based on minimization of square Bellman error. The training is often unstable due to a fast-changing target $Q$-values, and target networks are employed to stabilize by using an additional set of lagging parameters. Despite their advantages, target networks could inhibit the propagation of newly-encountered rewards which may ultimately slow down the training. In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks', the regularization here is explicit which not only enables us to use up-to-date parameters but also control the regularization. This leads to a fast yet stable training method. Across a range of Atari environments, we demonstrate empirical improvements over target-network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error.

Cite

Text

Piché et al. "Beyond Target Networks: Improving Deep $q$-Learning with Functional Regularization." NeurIPS 2021 Workshops: DeepRL, 2021.

Markdown

[Piché et al. "Beyond Target Networks: Improving Deep $q$-Learning with Functional Regularization." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/piche2021neuripsw-beyond/)

BibTeX

@inproceedings{piche2021neuripsw-beyond,
  title     = {{Beyond Target Networks: Improving Deep $q$-Learning with Functional Regularization}},
  author    = {Piché, Alexandre and Marino, Joseph and Marconi, Gian Maria and Thomas, Valentin and Pal, Christopher and Khan, Mohammad Emtiyaz},
  booktitle = {NeurIPS 2021 Workshops: DeepRL},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/piche2021neuripsw-beyond/}
}