Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Abstract

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.

Cite

Text

Malik et al. "Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems." Artificial Intelligence and Statistics, 2019.

Markdown

[Malik et al. "Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/malik2019aistats-derivativefree/)

BibTeX

@inproceedings{malik2019aistats-derivativefree,
  title     = {{Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems}},
  author    = {Malik, Dhruv and Pananjady, Ashwin and Bhatia, Kush and Khamaru, Koulik and Bartlett, Peter and Wainwright, Martin},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2019},
  pages     = {2916-2925},
  volume    = {89},
  url       = {https://mlanthology.org/aistats/2019/malik2019aistats-derivativefree/}
}