Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
Abstract
We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.
Cite
Text
Malik et al. "Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems." Artificial Intelligence and Statistics, 2019.Markdown
[Malik et al. "Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/malik2019aistats-derivativefree/)BibTeX
@inproceedings{malik2019aistats-derivativefree,
title = {{Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems}},
author = {Malik, Dhruv and Pananjady, Ashwin and Bhatia, Kush and Khamaru, Koulik and Bartlett, Peter and Wainwright, Martin},
booktitle = {Artificial Intelligence and Statistics},
year = {2019},
pages = {2916-2925},
volume = {89},
url = {https://mlanthology.org/aistats/2019/malik2019aistats-derivativefree/}
}