Continuous Rapid Action Value Estimates
Abstract
In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.
Cite
Text
Couëtoux et al. "Continuous Rapid Action Value Estimates." Proceedings of the Third Asian Conference on Machine Learning, 2011.Markdown
[Couëtoux et al. "Continuous Rapid Action Value Estimates." Proceedings of the Third Asian Conference on Machine Learning, 2011.](https://mlanthology.org/acml/2011/couetoux2011acml-continuous/)BibTeX
@inproceedings{couetoux2011acml-continuous,
title = {{Continuous Rapid Action Value Estimates}},
author = {Couëtoux, Adrien and Milone, Mario and Brendel, Mátyás and Doghmen, Hassan and Sebag, Michèle and Teytaud, Olivier},
booktitle = {Proceedings of the Third Asian Conference on Machine Learning},
year = {2011},
pages = {19-31},
volume = {20},
url = {https://mlanthology.org/acml/2011/couetoux2011acml-continuous/}
}