Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

Abstract

Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $\widetilde{\mathcal{O}}(\log N)$ regret under a specific identifiability assumption, where $N$ is the total number of episodes. If the identifiability assumption is not satisfied, we propose incorporating exploration noise into the least-squares-based algorithm, resulting in an algorithm with $\widetilde{\mathcal{O}}(\sqrt{N})$ regret. To our best knowledge, this is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator. Our proof relies on perturbation analysis of less-standard Riccati equations for risk-sensitive linear quadratic control, and a delicate analysis of the loss in the risk-sensitive performance criterion due to applying the suboptimal controller in the online learning process.

Cite

Text

Xu et al. "Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator." International Conference on Learning Representations, 2025.

Markdown

[Xu et al. "Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/xu2025iclr-regret/)

BibTeX

@inproceedings{xu2025iclr-regret,
  title     = {{Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator}},
  author    = {Xu, Wenhao and Gao, Xuefeng and He, Xuedong},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/xu2025iclr-regret/}
}