Active Exploration in Dynamic Environments

Abstract

\Vhenever an agent learns to control an unknown environment, two oppos(cid:173) ing principles have to be combined, namely: exploration (long-term opti(cid:173) mization) and exploitation (short-term optimization). Many real-valued connectionist approaches to learning control realize exploration by ran(cid:173) domness in action selection. This might be disadvantageous when costs are assigned to "negative experiences" . The basic idea presented in this paper is to make an agent explore unknown regions in a more directed manner. This is achieved by a so-called competence map, which is trained to predict the controller's accuracy, and is used for guiding exploration. Based on this, a bistable system enables smoothly switching attention between two behaviors - exploration and exploitation - depending on ex(cid:173) pected costs and knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task.

Cite

Text

Thrun and Möller. "Active Exploration in Dynamic Environments." Neural Information Processing Systems, 1991.

Markdown

[Thrun and Möller. "Active Exploration in Dynamic Environments." Neural Information Processing Systems, 1991.](https://mlanthology.org/neurips/1991/thrun1991neurips-active/)

BibTeX

@inproceedings{thrun1991neurips-active,
  title     = {{Active Exploration in Dynamic Environments}},
  author    = {Thrun, Sebastian B. and Möller, Knut},
  booktitle = {Neural Information Processing Systems},
  year      = {1991},
  pages     = {531-538},
  url       = {https://mlanthology.org/neurips/1991/thrun1991neurips-active/}
}