Black-Box Policy Search with Probabilistic Programs

Abstract

In this work, we explore how probabilistic programs can be used to represent policies in sequential decision problems. In this formulation, a probabilistic program is a black-box stochastic simulator for both the problem domain and the agent. We relate classic policy gradient techniques to recently introduced black-box variational methods which generalize to probabilistic program inference. We present case studies in the Canadian traveler problem, Rock Sample, and a benchmark for optimal diagnosis inspired by Guess Who. Each study illustrates how programs can efficiently represent policies using moderate numbers of parameters.

Cite

Text

van de Meent et al. "Black-Box Policy Search with Probabilistic Programs." International Conference on Artificial Intelligence and Statistics, 2016.

Markdown

[van de Meent et al. "Black-Box Policy Search with Probabilistic Programs." International Conference on Artificial Intelligence and Statistics, 2016.](https://mlanthology.org/aistats/2016/vandemeent2016aistats-black/)

BibTeX

@inproceedings{vandemeent2016aistats-black,
  title     = {{Black-Box Policy Search with Probabilistic Programs}},
  author    = {van de Meent, Jan-Willem and Paige, Brooks and Tolpin, David and Wood, Frank D.},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2016},
  pages     = {1195-1204},
  url       = {https://mlanthology.org/aistats/2016/vandemeent2016aistats-black/}
}