Online Expectation Maximization for Reinforcement Learning in POMDPs

Abstract

We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closed-form. As a result, the online algorithm has a time complexity O ( n ) and a memory complexity O (1), compared to O ( n 2 ) and O ( n ) for the corresponding batch-mode algorithm, where $n$ is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.

Cite

Text

Liu et al. "Online Expectation Maximization for Reinforcement Learning in POMDPs." International Joint Conference on Artificial Intelligence, 2013.

Markdown

[Liu et al. "Online Expectation Maximization for Reinforcement Learning in POMDPs." International Joint Conference on Artificial Intelligence, 2013.](https://mlanthology.org/ijcai/2013/liu2013ijcai-online/)

BibTeX

@inproceedings{liu2013ijcai-online,
  title     = {{Online Expectation Maximization for Reinforcement Learning in POMDPs}},
  author    = {Liu, Miao and Liao, Xuejun and Carin, Lawrence},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2013},
  pages     = {1501-1507},
  url       = {https://mlanthology.org/ijcai/2013/liu2013ijcai-online/}
}