No-Regret Learning with High-Probability in Adversarial Markov Decision Processes

Abstract

In a variety of problems, a decision-maker is unaware of the loss function associated with a task, yet it has to minimize this unknown loss in order to accomplish the task. Furthermore, the decision-maker’s task may evolve, resulting in a varying loss function. In this setting, we explore sequential decision-making problems modeled by adversarial Markov decision processes, where the loss function may arbitrarily change at every time step. We consider the bandit feedback scenario, where the agent observes only the loss corresponding to its actions. We propose an algorithm, called online relative-entropy policy search with implicit exploration, that achieves a sublinear regret not only in expectation but, more importantly, with high probability. In particular, we prove that by employing an optimistically biased loss estimator, the proposed algorithm achieves a regret of $\tilde{\mathcal{O}}((T|\act||\st|)^{\pp} \sqrt{\tau})$, where $|\st|$ is the number of states, $|\act|$ is the number of actions, $\tau$ is the mixing time, and $T$ is the time horizon. To our knowledge, the proposed algorithm is the first scheme that enjoys such high-probability regret bounds for general adversarial Markov decision processes under the presence of bandit feedback.

Cite

Text

Ghasemi et al. "No-Regret Learning with High-Probability in Adversarial Markov Decision Processes." Uncertainty in Artificial Intelligence, 2021.

Markdown

[Ghasemi et al. "No-Regret Learning with High-Probability in Adversarial Markov Decision Processes." Uncertainty in Artificial Intelligence, 2021.](https://mlanthology.org/uai/2021/ghasemi2021uai-noregret/)

BibTeX

@inproceedings{ghasemi2021uai-noregret,
  title     = {{No-Regret Learning with High-Probability in Adversarial Markov Decision Processes}},
  author    = {Ghasemi, Mahsa and Hashemi, Abolfazl and Vikalo, Haris and Topcu, Ufuk},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2021},
  pages     = {992-1001},
  volume    = {161},
  url       = {https://mlanthology.org/uai/2021/ghasemi2021uai-noregret/}
}