Kernel-Based Reinforcement Learning on Representative States

Abstract

Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method for batch-mode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernel-based RL on a set of k representative states. Similarly to kernel-based RL, our solution is a fixed point of a kernelized Bellman operator and can approximate the optimal solution to an arbitrary level of granularity. Unlike kernel-based RL, our method is fast. In particular, our policies can be computed in O(n) time, where n is the number of training examples. The time complexity of kernel-based RL is Ω(n2). We introduce our method, analyze its convergence, and compare it to existing work. The method is evaluated on two existing control problems with 2 to 4 continuous variables and a new problem with 64 variables. In all cases, we outperform state-of-the-art results and offer simpler solutions.

Cite

Text

Kveton and Theocharous. "Kernel-Based Reinforcement Learning on Representative States." AAAI Conference on Artificial Intelligence, 2012. doi:10.1609/AAAI.V26I1.8294

Markdown

[Kveton and Theocharous. "Kernel-Based Reinforcement Learning on Representative States." AAAI Conference on Artificial Intelligence, 2012.](https://mlanthology.org/aaai/2012/kveton2012aaai-kernel/) doi:10.1609/AAAI.V26I1.8294

BibTeX

@inproceedings{kveton2012aaai-kernel,
  title     = {{Kernel-Based Reinforcement Learning on Representative States}},
  author    = {Kveton, Branislav and Theocharous, Georgios},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2012},
  pages     = {977-983},
  doi       = {10.1609/AAAI.V26I1.8294},
  url       = {https://mlanthology.org/aaai/2012/kveton2012aaai-kernel/}
}