Approximate Policy Iteration with a Policy Language Bias

Abstract

We explore approximate policy iteration, replacing the usual cost- function learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for clas- sical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.

Cite

Text

Fern et al. "Approximate Policy Iteration with a Policy Language Bias." Neural Information Processing Systems, 2003.

Markdown

[Fern et al. "Approximate Policy Iteration with a Policy Language Bias." Neural Information Processing Systems, 2003.](https://mlanthology.org/neurips/2003/fern2003neurips-approximate/)

BibTeX

@inproceedings{fern2003neurips-approximate,
  title     = {{Approximate Policy Iteration with a Policy Language Bias}},
  author    = {Fern, Alan and Yoon, Sungwook and Givan, Robert},
  booktitle = {Neural Information Processing Systems},
  year      = {2003},
  pages     = {847-854},
  url       = {https://mlanthology.org/neurips/2003/fern2003neurips-approximate/}
}