Overcoming the Spectral Bias of Neural Value Approximation

Abstract

Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a \textit{spectral bias}, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. In this work, we re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural tangent kernel. With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. Faster convergence and better off-policy stability also make it possible to remove the target network without suffering catastrophic divergences, which further reduces TD(0)'s estimation bias on a few tasks. Code and analysis available at https://geyang.github.io/ffn.

Cite

Text

Yang et al. "Overcoming the Spectral Bias of Neural Value Approximation." International Conference on Learning Representations, 2022.

Markdown

[Yang et al. "Overcoming the Spectral Bias of Neural Value Approximation." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/yang2022iclr-overcoming/)

BibTeX

@inproceedings{yang2022iclr-overcoming,
  title     = {{Overcoming the Spectral Bias of Neural Value Approximation}},
  author    = {Yang, Ge and Ajay, Anurag and Agrawal, Pulkit},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/yang2022iclr-overcoming/}
}