Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery
Abstract
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions” that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery – an optimization problem that combines inventory control and vehicle routing.
Cite
Text
Proper and Tadepalli. "Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery." European Conference on Machine Learning, 2006. doi:10.1007/11871842_74Markdown
[Proper and Tadepalli. "Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery." European Conference on Machine Learning, 2006.](https://mlanthology.org/ecmlpkdd/2006/proper2006ecml-scaling/) doi:10.1007/11871842_74BibTeX
@inproceedings{proper2006ecml-scaling,
title = {{Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery}},
author = {Proper, Scott and Tadepalli, Prasad},
booktitle = {European Conference on Machine Learning},
year = {2006},
pages = {735-742},
doi = {10.1007/11871842_74},
url = {https://mlanthology.org/ecmlpkdd/2006/proper2006ecml-scaling/}
}