Expected Mistake Bound Model for On-Line Reinforcement Learning
Abstract
We propose a model of efficient on-line reinforcement learning based on the expected mistake bound framework introduced by Haussler, Littlestone and Warmuth (1987). The measure of performance we use is the expected difference between the total reward received by the learning agent and that received by an agent behaving optimally from the start. We call this expected difference the cumulative mistake of the agent and we require that it "levels off" at a reasonably fast rate as the learning progresses. We show that this model is polynomially equivalent to the PAC model of off-line reinforcement learning introduced in (Fiechter, 1994). In particular we show how an off-line PAC reinforcement learning algorithm can be transformed into an efficient on-line algorithm in a simple and practical way. An immediate consequence of this result is that the PAC algorithm for the general finite state-space reinforcement learning problem described in (Fiechter, 1994) can be transformed into a polynomial...
Cite
Text
Fiechter. "Expected Mistake Bound Model for On-Line Reinforcement Learning." International Conference on Machine Learning, 1997.Markdown
[Fiechter. "Expected Mistake Bound Model for On-Line Reinforcement Learning." International Conference on Machine Learning, 1997.](https://mlanthology.org/icml/1997/fiechter1997icml-expected/)BibTeX
@inproceedings{fiechter1997icml-expected,
title = {{Expected Mistake Bound Model for On-Line Reinforcement Learning}},
author = {Fiechter, Claude-Nicolas},
booktitle = {International Conference on Machine Learning},
year = {1997},
pages = {116-124},
url = {https://mlanthology.org/icml/1997/fiechter1997icml-expected/}
}