Auto-Exploratory Average Reward Reinforcement Learning
Abstract
We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this "Auto-exploratory H-learning" performs better than the original H-learning under previously studied exploration methods such as random, recency-based, or counter-based exploration. Introduction Reinforcement Learning (RL) is the study of learning agents that improve their performance at some task by receiving rewards and punishments from the environment. Most approaches to reinforcement learning, including Q-learning (Watkins and Dayan 92) and Adaptive Real-Time Dynamic Programming (ARTDP) (Barto, Bradtke, & Singh 95), optimize the total discounted reward the ...
Cite
Text
Ok and Tadepalli. "Auto-Exploratory Average Reward Reinforcement Learning." AAAI Conference on Artificial Intelligence, 1996.Markdown
[Ok and Tadepalli. "Auto-Exploratory Average Reward Reinforcement Learning." AAAI Conference on Artificial Intelligence, 1996.](https://mlanthology.org/aaai/1996/ok1996aaai-auto/)BibTeX
@inproceedings{ok1996aaai-auto,
title = {{Auto-Exploratory Average Reward Reinforcement Learning}},
author = {Ok, DoKyeong and Tadepalli, Prasad},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1996},
pages = {881-887},
url = {https://mlanthology.org/aaai/1996/ok1996aaai-auto/}
}