Gain Estimation of Linear Dynamical Systems Using Thompson Sampling
Abstract
We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is particularly a very important engineering problem in control design, where performance guarantees are casted in terms of the largest gain of the frequency response of the system. The dynamical system is unknown and only noisy input-output data is available. In a more general setup, the noise perturbing the data is non-white and the variance at each frequency band is unknown, resulting in a two-dimensional Gaussian bandit model with unknown mean and scaled-identity covariance matrix. This model corresponds to a two-parameter exponential family. Within a bandit framework, the set of means is given by the frequency response of the system and, unlike traditional bandit problems, the goal here is to maximize the probability of choosing the arm drawing samples with the highest norm of its mean. A problem-dependent lower bound for the expected cumulative regret is derived and a matching upper bound is obtained for a Thompson-Sampling algorithm under a uniform prior over the variances and the two-dimensional means.
Cite
Text
Müller and Rojas. "Gain Estimation of Linear Dynamical Systems Using Thompson Sampling." Artificial Intelligence and Statistics, 2019.Markdown
[Müller and Rojas. "Gain Estimation of Linear Dynamical Systems Using Thompson Sampling." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/muller2019aistats-gain/)BibTeX
@inproceedings{muller2019aistats-gain,
title = {{Gain Estimation of Linear Dynamical Systems Using Thompson Sampling}},
author = {Müller, Matias I. and Rojas, Cristian R.},
booktitle = {Artificial Intelligence and Statistics},
year = {2019},
pages = {1535-1543},
volume = {89},
url = {https://mlanthology.org/aistats/2019/muller2019aistats-gain/}
}