MDP Geometry, Normalization and Reward Balancing Solvers
Abstract
We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call \emph{Reward Balancing}, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.
Cite
Text
Mustafin et al. "MDP Geometry, Normalization and Reward Balancing Solvers." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.Markdown
[Mustafin et al. "MDP Geometry, Normalization and Reward Balancing Solvers." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/mustafin2025aistats-mdp/)BibTeX
@inproceedings{mustafin2025aistats-mdp,
title = {{MDP Geometry, Normalization and Reward Balancing Solvers}},
author = {Mustafin, Arsenii and Pakharev, Aleksei and Olshevsky, Alex and Paschalidis, Ioannis},
booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
year = {2025},
pages = {2476-2484},
volume = {258},
url = {https://mlanthology.org/aistats/2025/mustafin2025aistats-mdp/}
}