Towards Mixed Optimization forReinforcement Learning with Program Synthesis
Abstract
Deep reinforcement learning has led to many recent breakthroughs, but the learnt policies are often based on black-box neural networks, which makes them difficult to interpret and to impose desired specification constraints during learning. We present an iterative framework, MORL, for improving the learned policies using program synthesis. Concretely, we propose to use synthesis techniques to obtain a symbolic representation of the learned policy, which can then be debugged manually or automatically using program repair. After the repair step, we use behavior cloning to obtain the policy corresponding to the repaired program, which is then further improved using gradient descent. This process continues until the learned policy satisfies desired constraints. We instantiate MORL for the simple CartPole problem and show that the programmatic representation allows for high-level modifications that in turn lead to improved learning of the policies.
Cite
Text
Bhupatiraju et al. "Towards Mixed Optimization forReinforcement Learning with Program Synthesis." ICML 2018 Workshops: NAMPI, 2018.Markdown
[Bhupatiraju et al. "Towards Mixed Optimization forReinforcement Learning with Program Synthesis." ICML 2018 Workshops: NAMPI, 2018.](https://mlanthology.org/icmlw/2018/bhupatiraju2018icmlw-mixed/)BibTeX
@inproceedings{bhupatiraju2018icmlw-mixed,
title = {{Towards Mixed Optimization forReinforcement Learning with Program Synthesis}},
author = {Bhupatiraju, Surya and Agrawal, Kumar Krishna and Singh, Rishabh},
booktitle = {ICML 2018 Workshops: NAMPI},
year = {2018},
url = {https://mlanthology.org/icmlw/2018/bhupatiraju2018icmlw-mixed/}
}