Online Markov Decision Processes Configuration with Continuous Decision Space

Maran, Davide; Olivieri, Pierriccardo; Stradi, Francesco Emanuele; Urso, Giuseppe; Gatti, Nicola; Restelli, Marcello

doi:10.1609/AAAI.V38I13.29344

Online Markov Decision Processes Configuration with Continuous Decision Space

Davide Maran, Pierriccardo Olivieri, Francesco Emanuele Stradi, Giuseppe Urso, Nicola Gatti, Marcello Restelli

AAAI 2024 pp. 14315-14322

doi:10.1609/AAAI.V38I13.29344 /aaai/2024/maran2024aaai-online/

Abstract

In this paper, we investigate the optimal online configuration of episodic Markov decision processes when the space of the possible configurations is continuous. Specifically, we study the interaction between a learner (referred to as the configurator) and an agent with a fixed, unknown policy, when the learner aims to minimize her losses by choosing transition functions in online fashion. The losses may be unrelated to the agent's rewards. This problem applies to many real-world scenarios where the learner seeks to manipulate the Markov decision process to her advantage. We study both deterministic and stochastic settings, where the losses are either fixed or sampled from an unknown probability distribution. We design two algorithms whose peculiarity is to rely on occupancy measures to explore with optimism the continuous space of transition functions, achieving constant regret in deterministic settings and sublinear regret in stochastic settings, respectively. Moreover, we prove that the regret bound is tight with respect to any constant factor in deterministic settings. Finally, we compare the empiric performance of our algorithms with a baseline in synthetic experiments.

PDF AAAI Semantic Scholar

Cite

Text

Maran et al. "Online Markov Decision Processes Configuration with Continuous Decision Space." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I13.29344

Markdown

[Maran et al. "Online Markov Decision Processes Configuration with Continuous Decision Space." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/maran2024aaai-online/) doi:10.1609/AAAI.V38I13.29344

BibTeX

@inproceedings{maran2024aaai-online,
  title     = {{Online Markov Decision Processes Configuration with Continuous Decision Space}},
  author    = {Maran, Davide and Olivieri, Pierriccardo and Stradi, Francesco Emanuele and Urso, Giuseppe and Gatti, Nicola and Restelli, Marcello},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {14315-14322},
  doi       = {10.1609/AAAI.V38I13.29344},
  url       = {https://mlanthology.org/aaai/2024/maran2024aaai-online/}
}