MDP Geometry, Normalization and Reward Balancing Solvers

Abstract

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call \emph{Reward Balancing}, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.

Cite

Text

Mustafin et al. "MDP Geometry, Normalization and Reward Balancing Solvers." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Mustafin et al. "MDP Geometry, Normalization and Reward Balancing Solvers." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/mustafin2025aistats-mdp/)

BibTeX

@inproceedings{mustafin2025aistats-mdp,
  title     = {{MDP Geometry, Normalization and Reward Balancing Solvers}},
  author    = {Mustafin, Arsenii and Pakharev, Aleksei and Olshevsky, Alex and Paschalidis, Ioannis},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {2476-2484},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/mustafin2025aistats-mdp/}
}