An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions

Ma, Yao; Zhao, Tingting; Hatano, Kohei; Sugiyama, Masashi

doi:10.1007/978-3-662-44851-9_23

An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions

Yao Ma, Tingting Zhao, Kohei Hatano, Masashi Sugiyama

ECML-PKDD 2014 pp. 354-369

doi:10.1007/978-3-662-44851-9_23 /ecmlpkdd/2014/ma2014ecmlpkdd-online/

Abstract

We consider the learning problem under an online Markov decision process (MDP), which is aimed at learning the time-dependent decision-making policy of an agent that minimizes the regret — the difference from the best fixed policy. The difficulty of online MDP learning is that the reward function changes over time. In this paper, we show that a simple online policy gradient algorithm achieves regret $O(\sqrt{T})$ for T steps under a certain concavity assumption and O (log T ) under a strong concavity assumption. To the best of our knowledge, this is the first work to give an online MDP algorithm that can handle continuous state, action, and parameter spaces with guarantee. We also illustrate the behavior of the online policy gradient method through experiments.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Ma et al. "An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014. doi:10.1007/978-3-662-44851-9_23

Markdown

[Ma et al. "An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014.](https://mlanthology.org/ecmlpkdd/2014/ma2014ecmlpkdd-online/) doi:10.1007/978-3-662-44851-9_23

BibTeX

@inproceedings{ma2014ecmlpkdd-online,
  title     = {{An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions}},
  author    = {Ma, Yao and Zhao, Tingting and Hatano, Kohei and Sugiyama, Masashi},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2014},
  pages     = {354-369},
  doi       = {10.1007/978-3-662-44851-9_23},
  url       = {https://mlanthology.org/ecmlpkdd/2014/ma2014ecmlpkdd-online/}
}