Online Expectation Maximization for Reinforcement Learning in POMDPs

Liu, Miao; Liao, Xuejun; Carin, Lawrence

Online Expectation Maximization for Reinforcement Learning in POMDPs

IJCAI 2013 pp. 1501-1507

/ijcai/2013/liu2013ijcai-online/

Abstract

We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closed-form. As a result, the online algorithm has a time complexity O ( n ) and a memory complexity O (1), compared to O ( n 2 ) and O ( n ) for the corresponding batch-mode algorithm, where $n$ is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.

PDF IJCAI Semantic Scholar

Cite

Text

Liu et al. "Online Expectation Maximization for Reinforcement Learning in POMDPs." International Joint Conference on Artificial Intelligence, 2013.

Markdown

[Liu et al. "Online Expectation Maximization for Reinforcement Learning in POMDPs." International Joint Conference on Artificial Intelligence, 2013.](https://mlanthology.org/ijcai/2013/liu2013ijcai-online/)

BibTeX

@inproceedings{liu2013ijcai-online,
  title     = {{Online Expectation Maximization for Reinforcement Learning in POMDPs}},
  author    = {Liu, Miao and Liao, Xuejun and Carin, Lawrence},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2013},
  pages     = {1501-1507},
  url       = {https://mlanthology.org/ijcai/2013/liu2013ijcai-online/}
}