Program Synthesis Guided Reinforcement Learning for Partially Observed Environments
Abstract
A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must implement a strategy that correctly, and ideally optimally, handles every possible configuration of the hidden regions of the environment. We propose a new approach, model predictive program synthesis (MPPS), that uses program synthesis to automatically generate the guiding programs. It trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. In our experiments, we show that our approach significantly outperforms non-program-guided approaches on a set of challenging benchmarks, including a 2D Minecraft-inspired environment where the agent must complete a complex sequence of subtasks to achieve its goal, and achieves a similar performance as using handcrafted programs to guide the agent. Our results demonstrate that our approach can obtain the benefits of program-guided reinforcement learning without requiring the user to provide a new guiding program for every new task.
Cite
Text
Yang et al. "Program Synthesis Guided Reinforcement Learning for Partially Observed Environments." Neural Information Processing Systems, 2021.Markdown
[Yang et al. "Program Synthesis Guided Reinforcement Learning for Partially Observed Environments." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/yang2021neurips-program/)BibTeX
@inproceedings{yang2021neurips-program,
title = {{Program Synthesis Guided Reinforcement Learning for Partially Observed Environments}},
author = {Yang, Yichen and Inala, Jeevana Priya and Bastani, Osbert and Pu, Yewen and Solar-Lezama, Armando and Rinard, Martin},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/yang2021neurips-program/}
}