Efficient Exploration for Optimizing Immediate Reward

Schuurmans, Dale; Greenwald, Lloyd G.

Efficient Exploration for Optimizing Immediate Reward

AAAI 1999 pp. 385-392

/aaai/1999/schuurmans1999aaai-efficient/

Abstract

We consider the problem of learning an effective behav-ior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to real-world problems remains an important open issue. We investigate the inherent data-complexity of behav-ior-learning when the goal is simply to optimize im-mediate reward. Although easier than reinforcement learning, where one must also cope with state dynam-ics, immediate reward learning is still a common prob-lem and is fundamentally harder than supervised learn-ing. For optimizing immediate reward, prior knowledge can be expressed either as a bias on the space of possi-ble reward models, or a bias on the space of possi-ble controllers. We investigate the two paradigmatic learning approaches of indirect (reward-model) learn-ing and direct-control learning, and show that neither uniformly dominates the other in general. Model-based learning has the advantage of generalizing reward ex-periences across states and actions, but direct-control learning has the advantage of focusing only on poten-tially optimal actions and avoiding learning irrelevant world details. Both strategies can be strongly advanta-geous in different circumstances. We introduce hybrid learning strategies that combine the benefits of both approaches, and uniformly improve their learning effi-ciency.

PDF AAAI Semantic Scholar

Cite

Text

Schuurmans and Greenwald. "Efficient Exploration for Optimizing Immediate Reward." AAAI Conference on Artificial Intelligence, 1999.

Markdown

[Schuurmans and Greenwald. "Efficient Exploration for Optimizing Immediate Reward." AAAI Conference on Artificial Intelligence, 1999.](https://mlanthology.org/aaai/1999/schuurmans1999aaai-efficient/)

BibTeX

@inproceedings{schuurmans1999aaai-efficient,
  title     = {{Efficient Exploration for Optimizing Immediate Reward}},
  author    = {Schuurmans, Dale and Greenwald, Lloyd G.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1999},
  pages     = {385-392},
  url       = {https://mlanthology.org/aaai/1999/schuurmans1999aaai-efficient/}
}