Solving POMDPs with Levin Search and EIRA
Abstract
Partially observable Markov decision problems (POMDPs) recently received a lot of attention in the reinforcement learning community. No attention, however, has been paid to Levin's universal search through program space (LS), which is theoretically optimal for a wide variety of search problems including many POMDPs. Experiments in this paper first show that LS can solve partially observable mazes (POMs) involving many more states and obstacles than those solved by various previous authors (here, LS also can easily outperform Q-learning). We then note, however, that LS is not necessarily optimal for "incremental" learning problems where experience with previous problems may help to reduce future search costs. For this reason, we introduce an adaptive extension of LS (ALS) which uses experience to increase probabilities of instructions occurring in successful programs found by LS. To deal with cases where ALS does not lead to long term performance improvement, we use the recent technique...
Cite
Text
Wiering and Schmidhuber. "Solving POMDPs with Levin Search and EIRA." International Conference on Machine Learning, 1996.Markdown
[Wiering and Schmidhuber. "Solving POMDPs with Levin Search and EIRA." International Conference on Machine Learning, 1996.](https://mlanthology.org/icml/1996/wiering1996icml-solving/)BibTeX
@inproceedings{wiering1996icml-solving,
title = {{Solving POMDPs with Levin Search and EIRA}},
author = {Wiering, Marco A. and Schmidhuber, Jürgen},
booktitle = {International Conference on Machine Learning},
year = {1996},
pages = {534-542},
url = {https://mlanthology.org/icml/1996/wiering1996icml-solving/}
}