An Intrinsic Reward Mechanism for Efficient Exploration

Simsek, Özgür; Barto, Andrew G.

doi:10.1145/1143844.1143949

An Intrinsic Reward Mechanism for Efficient Exploration

Özgür Simsek, Andrew G. Barto

ICML 2006 pp. 833-840

doi:10.1145/1143844.1143949 /icml/2006/simsek2006icml-intrinsic/

Abstract

How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithmâs use for learning a policy for a skill given its reward functionâan important but neglected component of skill discovery.

PDF ICML Semantic Scholar

Cite

Text

Simsek and Barto. "An Intrinsic Reward Mechanism for Efficient Exploration." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143949

Markdown

[Simsek and Barto. "An Intrinsic Reward Mechanism for Efficient Exploration." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/simsek2006icml-intrinsic/) doi:10.1145/1143844.1143949

BibTeX

@inproceedings{simsek2006icml-intrinsic,
  title     = {{An Intrinsic Reward Mechanism for Efficient Exploration}},
  author    = {Simsek, Özgür and Barto, Andrew G.},
  booktitle = {International Conference on Machine Learning},
  year      = {2006},
  pages     = {833-840},
  doi       = {10.1145/1143844.1143949},
  url       = {https://mlanthology.org/icml/2006/simsek2006icml-intrinsic/}
}