Online Markov Decision Processes Configuration with Continuous Decision Space
Abstract
In this paper, we investigate the optimal online configuration of episodic Markov decision processes when the space of the possible configurations is continuous. Specifically, we study the interaction between a learner (referred to as the configurator) and an agent with a fixed, unknown policy, when the learner aims to minimize her losses by choosing transition functions in online fashion. The losses may be unrelated to the agent's rewards. This problem applies to many real-world scenarios where the learner seeks to manipulate the Markov decision process to her advantage. We study both deterministic and stochastic settings, where the losses are either fixed or sampled from an unknown probability distribution. We design two algorithms whose peculiarity is to rely on occupancy measures to explore with optimism the continuous space of transition functions, achieving constant regret in deterministic settings and sublinear regret in stochastic settings, respectively. Moreover, we prove that the regret bound is tight with respect to any constant factor in deterministic settings. Finally, we compare the empiric performance of our algorithms with a baseline in synthetic experiments.
Cite
Text
Maran et al. "Online Markov Decision Processes Configuration with Continuous Decision Space." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I13.29344Markdown
[Maran et al. "Online Markov Decision Processes Configuration with Continuous Decision Space." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/maran2024aaai-online/) doi:10.1609/AAAI.V38I13.29344BibTeX
@inproceedings{maran2024aaai-online,
title = {{Online Markov Decision Processes Configuration with Continuous Decision Space}},
author = {Maran, Davide and Olivieri, Pierriccardo and Stradi, Francesco Emanuele and Urso, Giuseppe and Gatti, Nicola and Restelli, Marcello},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {14315-14322},
doi = {10.1609/AAAI.V38I13.29344},
url = {https://mlanthology.org/aaai/2024/maran2024aaai-online/}
}