Program Synthesis Guided Reinforcement Learning for Partially Observed Environments

Yang, Yichen; Inala, Jeevana Priya; Bastani, Osbert; Pu, Yewen; Solar-Lezama, Armando; Rinard, Martin

Program Synthesis Guided Reinforcement Learning for Partially Observed Environments

Yichen Yang, Jeevana Priya Inala, Osbert Bastani, Yewen Pu, Armando Solar-Lezama, Martin Rinard

NeurIPS 2021

/neurips/2021/yang2021neurips-program/

Abstract

A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must implement a strategy that correctly, and ideally optimally, handles every possible configuration of the hidden regions of the environment. We propose a new approach, model predictive program synthesis (MPPS), that uses program synthesis to automatically generate the guiding programs. It trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. In our experiments, we show that our approach significantly outperforms non-program-guided approaches on a set of challenging benchmarks, including a 2D Minecraft-inspired environment where the agent must complete a complex sequence of subtasks to achieve its goal, and achieves a similar performance as using handcrafted programs to guide the agent. Our results demonstrate that our approach can obtain the benefits of program-guided reinforcement learning without requiring the user to provide a new guiding program for every new task.

PDF NeurIPS OpenReview Code Semantic Scholar

Cite

Text

Yang et al. "Program Synthesis Guided Reinforcement Learning for Partially Observed Environments." Neural Information Processing Systems, 2021.

Markdown

[Yang et al. "Program Synthesis Guided Reinforcement Learning for Partially Observed Environments." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/yang2021neurips-program/)

BibTeX

@inproceedings{yang2021neurips-program,
  title     = {{Program Synthesis Guided Reinforcement Learning for Partially Observed Environments}},
  author    = {Yang, Yichen and Inala, Jeevana Priya and Bastani, Osbert and Pu, Yewen and Solar-Lezama, Armando and Rinard, Martin},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/yang2021neurips-program/}
}