Efficient Exploration for Optimizing Immediate Reward
Abstract
We consider the problem of learning an effective behav-ior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to real-world problems remains an important open issue. We investigate the inherent data-complexity of behav-ior-learning when the goal is simply to optimize im-mediate reward. Although easier than reinforcement learning, where one must also cope with state dynam-ics, immediate reward learning is still a common prob-lem and is fundamentally harder than supervised learn-ing. For optimizing immediate reward, prior knowledge can be expressed either as a bias on the space of possi-ble reward models, or a bias on the space of possi-ble controllers. We investigate the two paradigmatic learning approaches of indirect (reward-model) learn-ing and direct-control learning, and show that neither uniformly dominates the other in general. Model-based learning has the advantage of generalizing reward ex-periences across states and actions, but direct-control learning has the advantage of focusing only on poten-tially optimal actions and avoiding learning irrelevant world details. Both strategies can be strongly advanta-geous in different circumstances. We introduce hybrid learning strategies that combine the benefits of both approaches, and uniformly improve their learning effi-ciency.
Cite
Text
Schuurmans and Greenwald. "Efficient Exploration for Optimizing Immediate Reward." AAAI Conference on Artificial Intelligence, 1999.Markdown
[Schuurmans and Greenwald. "Efficient Exploration for Optimizing Immediate Reward." AAAI Conference on Artificial Intelligence, 1999.](https://mlanthology.org/aaai/1999/schuurmans1999aaai-efficient/)BibTeX
@inproceedings{schuurmans1999aaai-efficient,
title = {{Efficient Exploration for Optimizing Immediate Reward}},
author = {Schuurmans, Dale and Greenwald, Lloyd G.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1999},
pages = {385-392},
url = {https://mlanthology.org/aaai/1999/schuurmans1999aaai-efficient/}
}