Incremental Least Squares Policy Iteration for POMDPs
Abstract
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon sta-tionary policy for partially observable Markov decision pro-cesses (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by min-imizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of opti-mal basis functions are determined by the algorithm to mini-mize the Bellman residual incrementally, via efficient compu-tations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency.
Cite
Text
Li et al. "Incremental Least Squares Policy Iteration for POMDPs." AAAI Conference on Artificial Intelligence, 2006.Markdown
[Li et al. "Incremental Least Squares Policy Iteration for POMDPs." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/li2006aaai-incremental/)BibTeX
@inproceedings{li2006aaai-incremental,
title = {{Incremental Least Squares Policy Iteration for POMDPs}},
author = {Li, Hui and Liao, Xuejun and Carin, Lawrence},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2006},
pages = {1167-1172},
url = {https://mlanthology.org/aaai/2006/li2006aaai-incremental/}
}