VDCBPI: An Approximate Scalable Algorithm for Large POMDPs

Abstract

Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Pol- icy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.

Cite

Text

Poupart and Boutilier. "VDCBPI: An Approximate Scalable Algorithm for Large POMDPs." Neural Information Processing Systems, 2004.

Markdown

[Poupart and Boutilier. "VDCBPI: An Approximate Scalable Algorithm for Large POMDPs." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/poupart2004neurips-vdcbpi/)

BibTeX

@inproceedings{poupart2004neurips-vdcbpi,
  title     = {{VDCBPI: An Approximate Scalable Algorithm for Large POMDPs}},
  author    = {Poupart, Pascal and Boutilier, Craig},
  booktitle = {Neural Information Processing Systems},
  year      = {2004},
  pages     = {1081-1088},
  url       = {https://mlanthology.org/neurips/2004/poupart2004neurips-vdcbpi/}
}