A Pontryagin Perspective on Reinforcement Learning

Abstract

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman’s equation from dynamic programming, our work builds on Pontryagin’s principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.

Cite

Text

Eberhard et al. "A Pontryagin Perspective on Reinforcement Learning." Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, 2025.

Markdown

[Eberhard et al. "A Pontryagin Perspective on Reinforcement Learning." Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, 2025.](https://mlanthology.org/l4dc/2025/eberhard2025l4dc-pontryagin/)

BibTeX

@inproceedings{eberhard2025l4dc-pontryagin,
  title     = {{A Pontryagin Perspective on Reinforcement Learning}},
  author    = {Eberhard, Onno and Vernade, Claire and Muehlebach, Michael},
  booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference},
  year      = {2025},
  pages     = {233-244},
  volume    = {283},
  url       = {https://mlanthology.org/l4dc/2025/eberhard2025l4dc-pontryagin/}
}