A Probabilistic Perspective on Reinforcement Learning via Supervised Learning

Alexandre Piché, Rafael Pardinas, David Vazquez, Christopher Pal

ICLRW 2022

/iclrw/2022/piche2022iclrw-probabilistic/

Abstract

Reinforcement Learning via Supervised Learning (RvS) only uses supervised techniques to learn desirable behaviors from large datasets. RvS has attracted much attention lately due to its simplicity and ability to leverage diverse trajectories. We introduce Density to Decision (D2D), a new framework, to unify a myriad of RvS algorithms. The Density to Decision framework formulates RvS as a two-step process: i) density estimation via supervised learning and ii) decision making via exponential tilting of the density. Using our framework, we categorise popular RvS algorithms and show how they are different by the design choices in their implementation. We then introduce a novel algorithm, Implicit RvS, leveraging powerful density estimation techniques that can easily be tilted to produce desirable behaviors. We compare the performance of a suite of RvS algorithms on the D4RL benchmark. Finally, we highlight the limitations of current RvS algorithms in comparison with traditional RL ones.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Piché et al. "A Probabilistic Perspective on Reinforcement Learning via Supervised Learning." ICLR 2022 Workshops: GPL, 2022.

Markdown

[Piché et al. "A Probabilistic Perspective on Reinforcement Learning via Supervised Learning." ICLR 2022 Workshops: GPL, 2022.](https://mlanthology.org/iclrw/2022/piche2022iclrw-probabilistic/)

BibTeX

@inproceedings{piche2022iclrw-probabilistic,
  title     = {{A Probabilistic Perspective on Reinforcement Learning via Supervised Learning}},
  author    = {Piché, Alexandre and Pardinas, Rafael and Vazquez, David and Pal, Christopher},
  booktitle = {ICLR 2022 Workshops: GPL},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/piche2022iclrw-probabilistic/}
}