Local Bandit Approximation for Optimal Learning Problems
Abstract
In general, procedures for determining Bayes-optimal adaptive controls for Markov decision processes (MDP's) require a pro(cid:173) hibitive amount of computation-the optimal learning problem is intractable. This paper proposes an approximate approach in which bandit processes are used to model, in a certain "local" sense, a given MDP. Bandit processes constitute an important subclass of MDP's, and have optimal learning strategies (defined in terms of Gittins indices) that can be computed relatively efficiently. Thus, one scheme for achieving approximately-optimal learning for gen(cid:173) eral MDP's proceeds by taking actions suggested by strategies that are optimal with respect to local bandit models.
Cite
Text
Duff and Barto. "Local Bandit Approximation for Optimal Learning Problems." Neural Information Processing Systems, 1996.Markdown
[Duff and Barto. "Local Bandit Approximation for Optimal Learning Problems." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/duff1996neurips-local/)BibTeX
@inproceedings{duff1996neurips-local,
title = {{Local Bandit Approximation for Optimal Learning Problems}},
author = {Duff, Michael O. and Barto, Andrew G.},
booktitle = {Neural Information Processing Systems},
year = {1996},
pages = {1019-1025},
url = {https://mlanthology.org/neurips/1996/duff1996neurips-local/}
}