Online Learning with Dynamics: A Minimax Perspective
Abstract
We consider the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem.
Cite
Text
Bhatia and Sridharan. "Online Learning with Dynamics: A Minimax Perspective." Neural Information Processing Systems, 2020.Markdown
[Bhatia and Sridharan. "Online Learning with Dynamics: A Minimax Perspective." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/bhatia2020neurips-online/)BibTeX
@inproceedings{bhatia2020neurips-online,
title = {{Online Learning with Dynamics: A Minimax Perspective}},
author = {Bhatia, Kush and Sridharan, Karthik},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/bhatia2020neurips-online/}
}