Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains

Abstract

Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult -- if not impossible -- to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach -- called NPPG -- that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.

Cite

Text

Kersting and Driessens. "Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390214

Markdown

[Kersting and Driessens. "Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/kersting2008icml-non/) doi:10.1145/1390156.1390214

BibTeX

@inproceedings{kersting2008icml-non,
  title     = {{Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains}},
  author    = {Kersting, Kristian and Driessens, Kurt},
  booktitle = {International Conference on Machine Learning},
  year      = {2008},
  pages     = {456-463},
  doi       = {10.1145/1390156.1390214},
  url       = {https://mlanthology.org/icml/2008/kersting2008icml-non/}
}