Transition Point Dynamic Programming

NeurIPS 1993 pp. 639-646

/neurips/1993/buckland1993neurips-transition/

Abstract

Transition point dynamic programming (TPDP) is a memory(cid:173) based, reinforcement learning, direct dynamic programming ap(cid:173) proach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the mer(cid:173) its of adding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory.

PDF NeurIPS Semantic Scholar

Cite

Text

Buckland and Lawrence. "Transition Point Dynamic Programming." Neural Information Processing Systems, 1993.

Markdown

[Buckland and Lawrence. "Transition Point Dynamic Programming." Neural Information Processing Systems, 1993.](https://mlanthology.org/neurips/1993/buckland1993neurips-transition/)

BibTeX

@inproceedings{buckland1993neurips-transition,
  title     = {{Transition Point Dynamic Programming}},
  author    = {Buckland, Kenneth M. and Lawrence, Peter D.},
  booktitle = {Neural Information Processing Systems},
  year      = {1993},
  pages     = {639-646},
  url       = {https://mlanthology.org/neurips/1993/buckland1993neurips-transition/}
}