A Pontryagin Perspective on Reinforcement Learning

Abstract

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a _closed-loop_ fashion. In this work, we introduce the paradigm of _open-loop reinforcement learning_ where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman's equation from dynamic programming, our work builds on _Pontryagin's principle_ from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, demonstrating remarkable performance compared to existing baselines.

Cite

Text

Eberhard et al. "A Pontryagin Perspective on Reinforcement Learning." ICML 2024 Workshops: RLControlTheory, 2024.

Markdown

[Eberhard et al. "A Pontryagin Perspective on Reinforcement Learning." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/eberhard2024icmlw-pontryagin/)

BibTeX

@inproceedings{eberhard2024icmlw-pontryagin,
  title     = {{A Pontryagin Perspective on Reinforcement Learning}},
  author    = {Eberhard, Onno and Vernade, Claire and Muehlebach, Michael},
  booktitle = {ICML 2024 Workshops: RLControlTheory},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/eberhard2024icmlw-pontryagin/}
}