Algorithms for Learning Markov Field Policies

Abstract

We present a new graph-based approach for incorporating domain knowledge in reinforcement learning applications. The domain knowledge is given as a weighted graph, or a kernel matrix, that loosely indicates which states should have similar optimal actions. We first introduce a bias into the policy search process by deriving a distribution on policies such that policies that disagree with the provided graph have low probabilities. This distribution corresponds to a Markov Random Field. We then present a reinforcement and an apprenticeship learning algorithms for finding such policy distributions. We also illustrate the advantage of the proposed approach on three problems: swing-up cart-balancing with nonuniform and smooth frictions, gridworlds, and teaching a robot to grasp new objects.

Cite

Text

Boularias et al. "Algorithms for Learning Markov Field Policies." Neural Information Processing Systems, 2012.

Markdown

[Boularias et al. "Algorithms for Learning Markov Field Policies." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/boularias2012neurips-algorithms/)

BibTeX

@inproceedings{boularias2012neurips-algorithms,
  title     = {{Algorithms for Learning Markov Field Policies}},
  author    = {Boularias, Abdeslam and Peters, Jan R. and Kroemer, Oliver B.},
  booktitle = {Neural Information Processing Systems},
  year      = {2012},
  pages     = {2177-2185},
  url       = {https://mlanthology.org/neurips/2012/boularias2012neurips-algorithms/}
}