Incremental Least Squares Policy Iteration for POMDPs

Li, Hui; Liao, Xuejun; Carin, Lawrence

Incremental Least Squares Policy Iteration for POMDPs

AAAI 2006 pp. 1167-1172

/aaai/2006/li2006aaai-incremental/

Abstract

We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon sta-tionary policy for partially observable Markov decision pro-cesses (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by min-imizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of opti-mal basis functions are determined by the algorithm to mini-mize the Bellman residual incrementally, via efficient compu-tations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency.

PDF AAAI Semantic Scholar

Cite

Text

Li et al. "Incremental Least Squares Policy Iteration for POMDPs." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Li et al. "Incremental Least Squares Policy Iteration for POMDPs." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/li2006aaai-incremental/)

BibTeX

@inproceedings{li2006aaai-incremental,
  title     = {{Incremental Least Squares Policy Iteration for POMDPs}},
  author    = {Li, Hui and Liao, Xuejun and Carin, Lawrence},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1167-1172},
  url       = {https://mlanthology.org/aaai/2006/li2006aaai-incremental/}
}