Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs

Abstract

We seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the mean Bellman residual. Our algorithm uses a Kalman filter model to estimate those weights and we have developed a simpler approximate Kalman filter model that outperforms the current state of the art projected TD-Learning methods on several standard benchmark problems.

Cite

Text

Tripp and Shachter. "Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs." Conference on Uncertainty in Artificial Intelligence, 2013.

Markdown

[Tripp and Shachter. "Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs." Conference on Uncertainty in Artificial Intelligence, 2013.](https://mlanthology.org/uai/2013/tripp2013uai-approximate/)

BibTeX

@inproceedings{tripp2013uai-approximate,
  title     = {{Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs}},
  author    = {Tripp, Charles and Shachter, Ross D.},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2013},
  url       = {https://mlanthology.org/uai/2013/tripp2013uai-approximate/}
}