Inductive Policy Selection for First-Order MDPs

Abstract

We select policies for large Markov Decision Processes (MDPs) with compact first-order representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamic-programming approaches based on flat, propositional, or first-order representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces first-order policies using training data constructed by solving small problem instances using PGraphplan (Blurn & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find "good" policies for several stochastic first-order MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcement-learning problem.

Cite

Text

Yoon et al. "Inductive Policy Selection for First-Order MDPs." Conference on Uncertainty in Artificial Intelligence, 2002.

Markdown

[Yoon et al. "Inductive Policy Selection for First-Order MDPs." Conference on Uncertainty in Artificial Intelligence, 2002.](https://mlanthology.org/uai/2002/yoon2002uai-inductive/)

BibTeX

@inproceedings{yoon2002uai-inductive,
  title     = {{Inductive Policy Selection for First-Order MDPs}},
  author    = {Yoon, Sung Wook and Fern, Alan and Givan, Robert},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2002},
  pages     = {568-576},
  url       = {https://mlanthology.org/uai/2002/yoon2002uai-inductive/}
}