Geometric Exploration for Online Control
Abstract
We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs the class of strongly-stable linear policies. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with $n^3\sqrt{T}$-regret, where $n$ is the dimension of the state plus the dimension of control input. The $\sqrt{T}$-horizon dependence is optimal, and improves upon the previous best known bound of $T^{2/3}$. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in an over-parameterized policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with $poly(n)\sqrt{T}$-regret, building on Stochastic Bandit Convex Optimization.
Cite
Text
Plevrakis and Hazan. "Geometric Exploration for Online Control." Neural Information Processing Systems, 2020.Markdown
[Plevrakis and Hazan. "Geometric Exploration for Online Control." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/plevrakis2020neurips-geometric/)BibTeX
@inproceedings{plevrakis2020neurips-geometric,
title = {{Geometric Exploration for Online Control}},
author = {Plevrakis, Orestis and Hazan, Elad},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/plevrakis2020neurips-geometric/}
}