Koopman Constrained Policy Optimization: A Koopman Operator Theoretic Method for Differentiable Optimal Control in Robotics

Abstract

We introduce Koopman Constrained Policy Optimization (KCPO), combining implicitly differentiable model predictive control with a deep Koopman autoencoder for robot learning in unknown and nonlinear dynamical systems. KCPO is a new policy optimization algorithm that trains neural policies end-to-end with hard box constraints on controls. Guaranteed satisfaction of hard constraints helps ensure the performance and safety of robots. We perform imitation learning with KCPO to recover expert policies on the Simple Pendulum, Cartpole Swing-Up, Reacher, and Differential Drive environments, outperforming baseline methods in generalizing to out-of-distribution constraints in most environments after training.

Cite

Text

Retchin et al. "Koopman Constrained Policy Optimization: A Koopman Operator Theoretic Method for Differentiable Optimal Control in Robotics." ICML 2023 Workshops: Differentiable_Almost_Everything, 2023.

Markdown

[Retchin et al. "Koopman Constrained Policy Optimization: A Koopman Operator Theoretic Method for Differentiable Optimal Control in Robotics." ICML 2023 Workshops: Differentiable_Almost_Everything, 2023.](https://mlanthology.org/icmlw/2023/retchin2023icmlw-koopman/)

BibTeX

@inproceedings{retchin2023icmlw-koopman,
  title     = {{Koopman Constrained Policy Optimization: A Koopman Operator Theoretic Method for Differentiable Optimal Control in Robotics}},
  author    = {Retchin, Matthew and Amos, Brandon and Brunton, Steven and Song, Shuran},
  booktitle = {ICML 2023 Workshops: Differentiable_Almost_Everything},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/retchin2023icmlw-koopman/}
}