Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Abstract

This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Q-learning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The first is to define a local measure of the uncertainty using the theory of bandit problems. We show that such a measure suffers from several drawbacks. In particular, a direct application of it leads to algorithms of low quality that can be easily misled by particular configurations of the environment. The second basic principle was introduced to eliminate this drawback. It consists of assimilating the local measures of uncertainty to rewards, and back-propagating them with the dynamic programming or temporal difference mechanisms. This allows reproducing global-scale reasoning about the uncertainty, using only local measures of it. Numerical simulations clearly show the efficiency of these propositions.

Cite

Text

Meuleau and Bourgine. "Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty." Machine Learning, 1999. doi:10.1023/A:1007541107674

Markdown

[Meuleau and Bourgine. "Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty." Machine Learning, 1999.](https://mlanthology.org/mlj/1999/meuleau1999mlj-exploration/) doi:10.1023/A:1007541107674

BibTeX

@article{meuleau1999mlj-exploration,
  title     = {{Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty}},
  author    = {Meuleau, Nicolas and Bourgine, Paul},
  journal   = {Machine Learning},
  year      = {1999},
  pages     = {117-154},
  doi       = {10.1023/A:1007541107674},
  volume    = {35},
  url       = {https://mlanthology.org/mlj/1999/meuleau1999mlj-exploration/}
}