Policy Search in Reproducing Kernel Hilbert Space

Abstract

Modeling policies in reproducing kernel Hilbert space (RKHS) renders policy gradient reinforcement learning algorithms non-parametric. As a result, the policies become very flexible and have a rich representational potential without a pre-defined set of features. However, their performances might be either non-covariant under re-parameterization of the chosen kernel, or very sensitive to step-size selection. In this paper, we propose to use a general framework to derive a new RKHS policy search technique. The new derivation leads to both a natural RKHS actor-critic algorithm and a RKHS expectation maximization (EM) policy search algorithm. Further, we show that kernelization enables us to learn in partially observable (POMDP) tasks which is considered daunting for parametric approaches. Via sparsification, a small set of "support vectors" representing the history is shown to be effectively discovered. For evaluations, we use three simulated (PO)MDP reinforcement learning tasks, and a simulated PR2's robotic manipulation task. The results demonstrate the effectiveness of the new RKHS policy search framework in comparison to plain RKHS actor-critic, episodic natural actor-critic, plain actor-critic, and PoWER approaches. PDF

Cite

Text

Vien et al. "Policy Search in Reproducing Kernel Hilbert Space." International Joint Conference on Artificial Intelligence, 2016.

Markdown

[Vien et al. "Policy Search in Reproducing Kernel Hilbert Space." International Joint Conference on Artificial Intelligence, 2016.](https://mlanthology.org/ijcai/2016/vien2016ijcai-policy/)

BibTeX

@inproceedings{vien2016ijcai-policy,
  title     = {{Policy Search in Reproducing Kernel Hilbert Space}},
  author    = {Vien, Ngo Anh and Englert, Peter and Toussaint, Marc},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {2089-2096},
  url       = {https://mlanthology.org/ijcai/2016/vien2016ijcai-policy/}
}