Approximate Policy Iteration with a Policy Language Bias
Abstract
We explore approximate policy iteration, replacing the usual cost- function learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for clas- sical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.
Cite
Text
Fern et al. "Approximate Policy Iteration with a Policy Language Bias." Neural Information Processing Systems, 2003.Markdown
[Fern et al. "Approximate Policy Iteration with a Policy Language Bias." Neural Information Processing Systems, 2003.](https://mlanthology.org/neurips/2003/fern2003neurips-approximate/)BibTeX
@inproceedings{fern2003neurips-approximate,
title = {{Approximate Policy Iteration with a Policy Language Bias}},
author = {Fern, Alan and Yoon, Sungwook and Givan, Robert},
booktitle = {Neural Information Processing Systems},
year = {2003},
pages = {847-854},
url = {https://mlanthology.org/neurips/2003/fern2003neurips-approximate/}
}