Error Bounds and Dynamics of Bootstrapping in Actor-Critic Reinforcement Learning

Zerouali, Ahmed J; Tweed, Douglas Blair

Error Bounds and Dynamics of Bootstrapping in Actor-Critic Reinforcement Learning

TMLR 2023

/tmlr/2023/zerouali2023tmlr-error/

Abstract

Actor-critic algorithms such as DDPG, TD3, and SAC, which are built on Silver's deterministic policy gradient theorem, are among the most successful reinforcement-learning methods, but their mathematical basis is not entirely clear. In particular, the critic networks in these algorithms learn to estimate action-value functions by a “bootstrapping” technique based on Bellman error, and it is unclear why this approach works so well in practice, given that Bellman error is only very loosely related to value error, i.e. to the inaccuracy of the action-value estimate. Here we show that policy training in this class of actor-critic methods depends not on the accuracy of the critic's action-value estimate but on how well the critic estimates the gradient of the action-value, which is better assessed using what we call difference error. We show that this difference error is closely related to the Bellman error — a finding which helps to explain why Bellman-based bootstrapping leads to good policies. Further, we show that value error and difference error show different dynamics along on-policy trajectories through state-action space: value error is a low-pass anticausal (i.e., backward-in-time) filter of Bellman error, and therefore accumulates along trajectories, whereas difference error is a high-pass filter of Bellman error. It follows that techniques which reduce the high-frequency Fourier components of the Bellman error may improve policy training even if they increase the actual size of the Bellman errors. These findings help to explain certain aspects of actor-critic methods that are otherwise theoretically puzzling, such as the use of policy (as distinct from exploratory) noise, and they suggest other measures that may improve these methods.

PDF TMLR Code Semantic Scholar

Cite

Text

Zerouali and Tweed. "Error Bounds and Dynamics of Bootstrapping in Actor-Critic Reinforcement Learning." Transactions on Machine Learning Research, 2023.

Markdown

[Zerouali and Tweed. "Error Bounds and Dynamics of Bootstrapping in Actor-Critic Reinforcement Learning." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/zerouali2023tmlr-error/)

BibTeX

@article{zerouali2023tmlr-error,
  title     = {{Error Bounds and Dynamics of Bootstrapping in Actor-Critic Reinforcement Learning}},
  author    = {Zerouali, Ahmed J and Tweed, Douglas Blair},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/zerouali2023tmlr-error/}
}