Policies Modulating Trajectory Generators

Abstract

We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.

Cite

Text

Iscen et al. "Policies Modulating Trajectory Generators." Conference on Robot Learning, 2018.

Markdown

[Iscen et al. "Policies Modulating Trajectory Generators." Conference on Robot Learning, 2018.](https://mlanthology.org/corl/2018/iscen2018corl-policies/)

BibTeX

@inproceedings{iscen2018corl-policies,
  title     = {{Policies Modulating Trajectory Generators}},
  author    = {Iscen, Atil and Caluwaerts, Ken and Tan, Jie and Zhang, Tingnan and Coumans, Erwin and Sindhwani, Vikas and Vanhoucke, Vincent},
  booktitle = {Conference on Robot Learning},
  year      = {2018},
  pages     = {916-926},
  url       = {https://mlanthology.org/corl/2018/iscen2018corl-policies/}
}