Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning

Santos, Pedro Pinto; Sardinha, Alberto; Melo, Francisco S.

Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning

Pedro Pinto Santos, Alberto Sardinha, Francisco S. Melo

ICLR 2026

/iclr/2026/santos2026iclr-solving/

Abstract

In this work, we contribute the first approach to solve infinite-horizon discounted general-utility Markov decision processes (GUMDPs) in the single-trial regime, i.e., when the agent's performance is evaluated based on a single trajectory. First, we provide some fundamental results regarding policy optimization in the single-trial regime, investigating which class of policies suffices for optimality, casting our problem as a particular MDP that is equivalent to our original problem, as well as studying the computational hardness of policy optimization in the single-trial regime. Second, we show how we can leverage online planning techniques, in particular a Monte-Carlo tree search algorithm, to solve GUMDPs in the single-trial regime. Third, we provide experimental results showcasing the superior performance of our approach in comparison to relevant baselines.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Santos et al. "Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning." International Conference on Learning Representations, 2026.

Markdown

[Santos et al. "Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/santos2026iclr-solving/)

BibTeX

@inproceedings{santos2026iclr-solving,
  title     = {{Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning}},
  author    = {Santos, Pedro Pinto and Sardinha, Alberto and Melo, Francisco S.},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/santos2026iclr-solving/}
}