Reinforcement Learning with Value Advice
Abstract
The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.
Cite
Text
Daswani et al. "Reinforcement Learning with Value Advice." Proceedings of the Sixth Asian Conference on Machine Learning, 2014.Markdown
[Daswani et al. "Reinforcement Learning with Value Advice." Proceedings of the Sixth Asian Conference on Machine Learning, 2014.](https://mlanthology.org/acml/2014/daswani2014acml-reinforcement/)BibTeX
@inproceedings{daswani2014acml-reinforcement,
title = {{Reinforcement Learning with Value Advice}},
author = {Daswani, Mayank and Sunehag, Peter and Hutter, Marcus},
booktitle = {Proceedings of the Sixth Asian Conference on Machine Learning},
year = {2014},
pages = {299-314},
volume = {39},
url = {https://mlanthology.org/acml/2014/daswani2014acml-reinforcement/}
}