Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers

Abstract

Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping

Cite

Text

Pinneri et al. "Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers." International Conference on Learning Representations, 2021.

Markdown

[Pinneri et al. "Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/pinneri2021iclr-extracting/)

BibTeX

@inproceedings{pinneri2021iclr-extracting,
  title     = {{Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers}},
  author    = {Pinneri, Cristina and Sawant, Shambhuraj and Blaes, Sebastian and Martius, Georg},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/pinneri2021iclr-extracting/}
}