Adaptive Choice of Grid and Time in Reinforcement Learning

Abstract

We propose local error estimates together with algorithms for adap(cid:173) tive a-posteriori grid and time refinement in reinforcement learn(cid:173) ing. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid re(cid:173) finement we follow the procedure of numerical methods for the Bellman-equation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellman(cid:173) equation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.

Cite

Text

Pareigis. "Adaptive Choice of Grid and Time in Reinforcement Learning." Neural Information Processing Systems, 1997.

Markdown

[Pareigis. "Adaptive Choice of Grid and Time in Reinforcement Learning." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/pareigis1997neurips-adaptive/)

BibTeX

@inproceedings{pareigis1997neurips-adaptive,
  title     = {{Adaptive Choice of Grid and Time in Reinforcement Learning}},
  author    = {Pareigis, Stephan},
  booktitle = {Neural Information Processing Systems},
  year      = {1997},
  pages     = {1036-1042},
  url       = {https://mlanthology.org/neurips/1997/pareigis1997neurips-adaptive/}
}