Reinforcement Learning Applied to Linear Quadratic Regulation

Abstract

Recent research on reinforcement learning has focused on algo(cid:173) rithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algo(cid:173) rithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con ver(cid:173) gence proofs for problems with continuous state and action spaces, or for systems involving non-linear function approximators (such as multilayer perceptrons). This paper presents research applying DP-based reinforcement learning theory to Linear Quadratic Reg(cid:173) ulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of non-linear function approximator. We describe an algorithm based on Q-Iearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a non-linear function approximator with DP-based learning.

Cite

Text

Bradtke. "Reinforcement Learning Applied to Linear Quadratic Regulation." Neural Information Processing Systems, 1992.

Markdown

[Bradtke. "Reinforcement Learning Applied to Linear Quadratic Regulation." Neural Information Processing Systems, 1992.](https://mlanthology.org/neurips/1992/bradtke1992neurips-reinforcement/)

BibTeX

@inproceedings{bradtke1992neurips-reinforcement,
  title     = {{Reinforcement Learning Applied to Linear Quadratic Regulation}},
  author    = {Bradtke, Steven J.},
  booktitle = {Neural Information Processing Systems},
  year      = {1992},
  pages     = {295-302},
  url       = {https://mlanthology.org/neurips/1992/bradtke1992neurips-reinforcement/}
}