Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Abstract

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

Cite

Text

Westenbroek et al. "Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models." Conference on Robot Learning, 2023.

Markdown

[Westenbroek et al. "Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/westenbroek2023corl-enabling/)

BibTeX

@inproceedings{westenbroek2023corl-enabling,
  title     = {{Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models}},
  author    = {Westenbroek, Tyler and Levy, Jacob and Fridovich-Keil, David},
  booktitle = {Conference on Robot Learning},
  year      = {2023},
  pages     = {2478-2497},
  volume    = {229},
  url       = {https://mlanthology.org/corl/2023/westenbroek2023corl-enabling/}
}