Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs
Abstract
If you have planned to achieve one particular goal in a stochastic delayed rewards problem and then someone asks about a different goal what should you do? What if you need to be ready to quickly supply an answer for any possible goal? This paper shows that by using a new kind of automatically generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal. In goal-based Markov Decision Problems, it is usual to generate a policy ß(x), mapping states to actions, and a value function J(x), mapping states to an estimate of minimum expected cost-to-goal, starting at x. In this paper we will use the terminology that a multi-policy ß ? (x; y) (for all state-pairs (x; y)) maps a state x to the first action it should take in order to reach y with expected minimum cost and a multi-valuefunction J ? (x; y) is a definition of this minimum cost. Building these objects quickly and with ...
Cite
Text
Moore et al. "Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs." International Joint Conference on Artificial Intelligence, 1999.Markdown
[Moore et al. "Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/moore1999ijcai-multi/)BibTeX
@inproceedings{moore1999ijcai-multi,
title = {{Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs}},
author = {Moore, Andrew W. and Iii, Leemon C. Baird and Kaelbling, Leslie Pack},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {1999},
pages = {1316-1323},
url = {https://mlanthology.org/ijcai/1999/moore1999ijcai-multi/}
}