A Closer Look at Gradient Estimators with Reinforcement Learning as Inference

Abstract

The concept of reinforcement learning as inference (RLAI) has led to the creation of a variety of popular algorithms in deep reinforcement learning. Unfortunately, most research in this area relies on wider algorithmic innovations not necessarily relevant to such frameworks. Additionally, many seemingly unimportant modifications made to these algorithms, actually produce inconsistencies with the original inference problem posed by RLAI. Taking a divergence minimization perspective, this work considers some of the practical merits and theoretical issues created by the choice of loss function minimized in the policy update for off-policy reinforcement learning. Our results show that while the choice of divergence rarely has a major affect on the sample efficiency of the algorithm, it can have important practical repercussions on ease of implementation, computational efficiency, and restrictions to the distribution over actions.

Cite

Text

Lavington et al. "A Closer Look at Gradient Estimators with Reinforcement Learning as Inference." NeurIPS 2021 Workshops: DeepRL, 2021.

Markdown

[Lavington et al. "A Closer Look at Gradient Estimators with Reinforcement Learning as Inference." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/lavington2021neuripsw-closer/)

BibTeX

@inproceedings{lavington2021neuripsw-closer,
  title     = {{A Closer Look at Gradient Estimators with Reinforcement Learning as Inference}},
  author    = {Lavington, Jonathan Wilder and Teng, Michael and Schmidt, Mark and Wood, Frank},
  booktitle = {NeurIPS 2021 Workshops: DeepRL},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/lavington2021neuripsw-closer/}
}