Improved Strong Worst-Case Upper Bounds for MDP Planning

Gupta, Anchit; Kalyanakrishnan, Shivaram

doi:10.24963/IJCAI.2017/248

Improved Strong Worst-Case Upper Bounds for MDP Planning

Anchit Gupta, Shivaram Kalyanakrishnan

IJCAI 2017 pp. 1788-1794

doi:10.24963/IJCAI.2017/248 /ijcai/2017/gupta2017ijcai-improved/

Abstract

The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP PLANNING, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved STRONG WORST-CASE upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of states n and the number of actions k in the specified MDP; they have no dependence on affiliated variables such as the discount factor and the number of bits needed to represent the MDP. Worst-case bounds apply to EVERY run of an algorithm; randomised algorithms can typically yield faster EXPECTED running times. While the special case of 2-action MDPs (that is, k = 2) has recently received some attention, bounds for general k have remained to be improved for several decades. Our contributions are to this general case. For k >= 3, the tightest strong upper bound shown to date for MDP planning belongs to a family of algorithms called Policy Iteration. This bound is only a polynomial improvement over a trivial bound of poly(n, k) k^n [Mansour and Singh, 1999]. In this paper, we generalise a contrasting algorithm called the Fibonacci Seesaw, and derive a bound of poly(n, k) k^0.6834n. The key construct we use is a template to map algorithms for the 2-action setting to the general setting. Interestingly, this idea can also be used to design Policy Iteration algorithms with a running time upper bound of poly(n, k) k^0.7207n. Both our results improve upon bounds that have stood for several decades.

PDF IJCAI Semantic Scholar

Cite

Text

Gupta and Kalyanakrishnan. "Improved Strong Worst-Case Upper Bounds for MDP Planning." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/248

Markdown

[Gupta and Kalyanakrishnan. "Improved Strong Worst-Case Upper Bounds for MDP Planning." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/gupta2017ijcai-improved/) doi:10.24963/IJCAI.2017/248

BibTeX

@inproceedings{gupta2017ijcai-improved,
  title     = {{Improved Strong Worst-Case Upper Bounds for MDP Planning}},
  author    = {Gupta, Anchit and Kalyanakrishnan, Shivaram},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {1788-1794},
  doi       = {10.24963/IJCAI.2017/248},
  url       = {https://mlanthology.org/ijcai/2017/gupta2017ijcai-improved/}
}