Markov Decision Processes with Time-Varying Geometric Discounting
Abstract
Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective – whereby each time step is treated as an independent decision maker with their own (fixed) discount factor – and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of epsilon-SPE and show that an epsilon-SPE exists under milder assumptions. An algorithm is presented to compute an epsilon-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.
Cite
Text
Gan et al. "Markov Decision Processes with Time-Varying Geometric Discounting." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I10.26413Markdown
[Gan et al. "Markov Decision Processes with Time-Varying Geometric Discounting." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/gan2023aaai-markov/) doi:10.1609/AAAI.V37I10.26413BibTeX
@inproceedings{gan2023aaai-markov,
title = {{Markov Decision Processes with Time-Varying Geometric Discounting}},
author = {Gan, Jiarui and Hennes, Annika and Majumdar, Rupak and Mandal, Debmalya and Radanovic, Goran},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2023},
pages = {11980-11988},
doi = {10.1609/AAAI.V37I10.26413},
url = {https://mlanthology.org/aaai/2023/gan2023aaai-markov/}
}