Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery

Abstract

Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions” that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery – an optimization problem that combines inventory control and vehicle routing.

Cite

Text

Proper and Tadepalli. "Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery." European Conference on Machine Learning, 2006. doi:10.1007/11871842_74

Markdown

[Proper and Tadepalli. "Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery." European Conference on Machine Learning, 2006.](https://mlanthology.org/ecmlpkdd/2006/proper2006ecml-scaling/) doi:10.1007/11871842_74

BibTeX

@inproceedings{proper2006ecml-scaling,
  title     = {{Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery}},
  author    = {Proper, Scott and Tadepalli, Prasad},
  booktitle = {European Conference on Machine Learning},
  year      = {2006},
  pages     = {735-742},
  doi       = {10.1007/11871842_74},
  url       = {https://mlanthology.org/ecmlpkdd/2006/proper2006ecml-scaling/}
}