Online Learning with Dynamics: A Minimax Perspective

Abstract

We consider the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem.

Cite

Text

Bhatia and Sridharan. "Online Learning with Dynamics: A Minimax Perspective." Neural Information Processing Systems, 2020.

Markdown

[Bhatia and Sridharan. "Online Learning with Dynamics: A Minimax Perspective." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/bhatia2020neurips-online/)

BibTeX

@inproceedings{bhatia2020neurips-online,
  title     = {{Online Learning with Dynamics: A Minimax Perspective}},
  author    = {Bhatia, Kush and Sridharan, Karthik},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/bhatia2020neurips-online/}
}