Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt$T Regret
Abstract
We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.
Cite
Text
Cassel and Koren. "Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt$T Regret." International Conference on Machine Learning, 2021.Markdown
[Cassel and Koren. "Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt$T Regret." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/cassel2021icml-online/)BibTeX
@inproceedings{cassel2021icml-online,
title = {{Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt$T Regret}},
author = {Cassel, Asaf B and Koren, Tomer},
booktitle = {International Conference on Machine Learning},
year = {2021},
pages = {1304-1313},
volume = {139},
url = {https://mlanthology.org/icml/2021/cassel2021icml-online/}
}