Optimal Dynamic Regret in LQR Control

Abstract

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of $\tilde{O}(n^{1/3} \mathcal{TV}(M_{1:n}^{2/3} \vee 1)$, where $\mathcal{TV}(M_{1:n})$ is the total variation of any oracle sequence of \emph{Disturbance Action} policies parameterized by $M_1,...,M_n$ --- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of $\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ for general convex losses and is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster & Simchowitz 2020, as well as a new \emph{proper} learning algorithm with an optimal $\tilde{O}(n^{1/3})$ dynamic regret on a family of "minibatched'' quadratic losses, which could be of independent interest.

Cite

Text

Baby and Wang. "Optimal Dynamic Regret in LQR Control." Neural Information Processing Systems, 2022.

Markdown

[Baby and Wang. "Optimal Dynamic Regret in LQR Control." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/baby2022neurips-optimal/)

BibTeX

@inproceedings{baby2022neurips-optimal,
  title     = {{Optimal Dynamic Regret in LQR Control}},
  author    = {Baby, Dheeraj and Wang, Yu-Xiang},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/baby2022neurips-optimal/}
}