Online Learning from Optimal Actions
Abstract
We study the problem of online contextual optimization where, at each period, instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would have taken. At each period, the decision-maker has access to a new set of feasible actions to select from and to a new contextual function that affects that period’s loss function. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle. We obtain the first regret bound for this problem that is logarithmic in the time horizon. Our results are derived through the development and analysis of a novel algorithmic structure that leverages the underlying geometry of the problem.
Cite
Text
Besbes et al. "Online Learning from Optimal Actions." Conference on Learning Theory, 2021.Markdown
[Besbes et al. "Online Learning from Optimal Actions." Conference on Learning Theory, 2021.](https://mlanthology.org/colt/2021/besbes2021colt-online/)BibTeX
@inproceedings{besbes2021colt-online,
title = {{Online Learning from Optimal Actions}},
author = {Besbes, Omar and Fonseca, Yuri and Lobel, Ilan},
booktitle = {Conference on Learning Theory},
year = {2021},
pages = {586-586},
volume = {134},
url = {https://mlanthology.org/colt/2021/besbes2021colt-online/}
}