Reinforcement Learning Applied to Linear Quadratic Regulation
Abstract
Recent research on reinforcement learning has focused on algo(cid:173) rithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algo(cid:173) rithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con ver(cid:173) gence proofs for problems with continuous state and action spaces, or for systems involving non-linear function approximators (such as multilayer perceptrons). This paper presents research applying DP-based reinforcement learning theory to Linear Quadratic Reg(cid:173) ulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of non-linear function approximator. We describe an algorithm based on Q-Iearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a non-linear function approximator with DP-based learning.
Cite
Text
Bradtke. "Reinforcement Learning Applied to Linear Quadratic Regulation." Neural Information Processing Systems, 1992.Markdown
[Bradtke. "Reinforcement Learning Applied to Linear Quadratic Regulation." Neural Information Processing Systems, 1992.](https://mlanthology.org/neurips/1992/bradtke1992neurips-reinforcement/)BibTeX
@inproceedings{bradtke1992neurips-reinforcement,
title = {{Reinforcement Learning Applied to Linear Quadratic Regulation}},
author = {Bradtke, Steven J.},
booktitle = {Neural Information Processing Systems},
year = {1992},
pages = {295-302},
url = {https://mlanthology.org/neurips/1992/bradtke1992neurips-reinforcement/}
}