Transition Point Dynamic Programming
Abstract
Transition point dynamic programming (TPDP) is a memory(cid:173) based, reinforcement learning, direct dynamic programming ap(cid:173) proach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the mer(cid:173) its of adding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory.
Cite
Text
Buckland and Lawrence. "Transition Point Dynamic Programming." Neural Information Processing Systems, 1993.Markdown
[Buckland and Lawrence. "Transition Point Dynamic Programming." Neural Information Processing Systems, 1993.](https://mlanthology.org/neurips/1993/buckland1993neurips-transition/)BibTeX
@inproceedings{buckland1993neurips-transition,
title = {{Transition Point Dynamic Programming}},
author = {Buckland, Kenneth M. and Lawrence, Peter D.},
booktitle = {Neural Information Processing Systems},
year = {1993},
pages = {639-646},
url = {https://mlanthology.org/neurips/1993/buckland1993neurips-transition/}
}