Program Synthesis Guided Reinforcement Learning for Partially Observed Environments

Abstract

A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must implement a strategy that correctly, and ideally optimally, handles every possible configuration of the hidden regions of the environment. We propose a new approach, model predictive program synthesis (MPPS), that uses program synthesis to automatically generate the guiding programs. It trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. In our experiments, we show that our approach significantly outperforms non-program-guided approaches on a set of challenging benchmarks, including a 2D Minecraft-inspired environment where the agent must complete a complex sequence of subtasks to achieve its goal, and achieves a similar performance as using handcrafted programs to guide the agent. Our results demonstrate that our approach can obtain the benefits of program-guided reinforcement learning without requiring the user to provide a new guiding program for every new task.

Cite

Text

Yang et al. "Program Synthesis Guided Reinforcement Learning for Partially Observed Environments." Neural Information Processing Systems, 2021.

Markdown

[Yang et al. "Program Synthesis Guided Reinforcement Learning for Partially Observed Environments." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/yang2021neurips-program/)

BibTeX

@inproceedings{yang2021neurips-program,
  title     = {{Program Synthesis Guided Reinforcement Learning for Partially Observed Environments}},
  author    = {Yang, Yichen and Inala, Jeevana Priya and Bastani, Osbert and Pu, Yewen and Solar-Lezama, Armando and Rinard, Martin},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/yang2021neurips-program/}
}