Complexity of Derivative-Free Policy Optimization for Structured $\mathcal{H}_\infty$ Control

Abstract

The applications of direct policy search in reinforcement learning and continuous control have received increasing attention.In this work, we present novel theoretical results on the complexity of derivative-free policy optimization on an important class of robust control tasks, namely the structured $H_\infty$ synthesis with static output feedback. Optimal $H_\infty$ synthesis under structural constraints leads to a constrained nonconvex nonsmooth problem and is typicallyaddressed using subgradient-based policy search techniques that are built upon the concept of Goldstein subdifferential or other notions of enlarged subdifferential. In this paper, we study the complexity of finding $(\delta,\epsilon)$-stationary points for such nonsmooth robust control design tasks using policy optimization methods which can only access the zeroth-order oracle (i.e. the $H_\infty$ norm of the closed-loop system). First, we study the exact oracle setting and identify the coerciveness of the cost function to prove high-probability feasibility/complexity bounds for derivative-free policy optimization on this problem. Next, we derive a sample complexity result for the multi-input multi-output (MIMO) $H_\infty$-norm estimation. We combine this with our analysis to obtain the first sample complexity of model-free, trajectory-based, zeroth-order policy optimization on finding $(\delta,\epsilon)$-stationary points for structured $H_\infty$ control. Numerical results are also provided to demonstrate our theory.

Cite

Text

Guo et al. "Complexity of Derivative-Free Policy Optimization for Structured $\mathcal{H}_\infty$ Control." Neural Information Processing Systems, 2023.

Markdown

[Guo et al. "Complexity of Derivative-Free Policy Optimization for Structured $\mathcal{H}_\infty$ Control." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/guo2023neurips-complexity/)

BibTeX

@inproceedings{guo2023neurips-complexity,
  title     = {{Complexity of Derivative-Free Policy Optimization for Structured $\mathcal{H}_\infty$ Control}},
  author    = {Guo, Xingang and Keivan, Darioush and Dullerud, Geir and Seiler, Peter and Hu, Bin},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/guo2023neurips-complexity/}
}