Incremental Least Squares Policy Iteration for POMDPs

Abstract

We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon sta-tionary policy for partially observable Markov decision pro-cesses (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by min-imizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of opti-mal basis functions are determined by the algorithm to mini-mize the Bellman residual incrementally, via efficient compu-tations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency.

Cite

Text

Li et al. "Incremental Least Squares Policy Iteration for POMDPs." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Li et al. "Incremental Least Squares Policy Iteration for POMDPs." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/li2006aaai-incremental/)

BibTeX

@inproceedings{li2006aaai-incremental,
  title     = {{Incremental Least Squares Policy Iteration for POMDPs}},
  author    = {Li, Hui and Liao, Xuejun and Carin, Lawrence},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1167-1172},
  url       = {https://mlanthology.org/aaai/2006/li2006aaai-incremental/}
}