Approximating Optimal Policies for Partially Observable Stochastic Domains

Abstract

The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligence. If the state of the world is known at all times, the world can be modeled as a Markov Decision Process (MDP). MDPs have been studied extensively and many methods are known for determining optimal courses of action, or policies. The more realistic case where state information is only partially observable, Partially Observable Markov Decision Processes (POMDPs), have received much less attention. The best exact algorithms for these problems can be very inefficient in both space and time. We introduce Smooth Partially Observable Value Approximation (SPOVA), a new approximation method that can quickly yield good approximations which can improve over time. This method can be combined with reinforcement learning methods, a combination that was very effective in our test cases. 1 Introduction Markov Decision Processes (MDPs) have proven to be useful abstractions for a variety of problems. W...

Cite

Text

Parr and Russell. "Approximating Optimal Policies for Partially Observable Stochastic Domains." International Joint Conference on Artificial Intelligence, 1995.

Markdown

[Parr and Russell. "Approximating Optimal Policies for Partially Observable Stochastic Domains." International Joint Conference on Artificial Intelligence, 1995.](https://mlanthology.org/ijcai/1995/parr1995ijcai-approximating/)

BibTeX

@inproceedings{parr1995ijcai-approximating,
  title     = {{Approximating Optimal Policies for Partially Observable Stochastic Domains}},
  author    = {Parr, Ronald and Russell, Stuart},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {1995},
  pages     = {1088-1095},
  url       = {https://mlanthology.org/ijcai/1995/parr1995ijcai-approximating/}
}