Complexity of Probabilistic Planning Under Average Rewards

Abstract

A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large state spaces are best modelled implicitly, for example as precondition-effect operators, the representation used in AI planning. This kind of representations are very powerful, and they make the construction of policies/plans computationally very complex. Earlier work on the complexity of conditional and probabilistic planning in the general MDP/POMDP framework has concentrated on finite horizons and rewards/costs that are geometrically discounted. In many applications, average rewards over unit time is a more relevant rationality criterion, and for providing a solid basis for the development of efficient planning algorithms, the computational complexity of the problems has to be analyzed. We investigate the complexity of finding average reward optimal plans/policies for MDPs, represented a...

Cite

Text

Rintanen. "Complexity of Probabilistic Planning Under Average Rewards." International Joint Conference on Artificial Intelligence, 2001.

Markdown

[Rintanen. "Complexity of Probabilistic Planning Under Average Rewards." International Joint Conference on Artificial Intelligence, 2001.](https://mlanthology.org/ijcai/2001/rintanen2001ijcai-complexity/)

BibTeX

@inproceedings{rintanen2001ijcai-complexity,
  title     = {{Complexity of Probabilistic Planning Under Average Rewards}},
  author    = {Rintanen, Jussi},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2001},
  pages     = {503-508},
  url       = {https://mlanthology.org/ijcai/2001/rintanen2001ijcai-complexity/}
}