No-Regret Reinforcement Learning in Smooth MDPs

Abstract

Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision processes (MDPs), namely $\nu-$smoothness, that generalizes most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To face this challenging scenario, we propose two algorithms for regret minimization in $\nu-$smooth MDPs. Both algorithms build upon the idea of constructing an MDP representation through an orthogonal feature map based on Legendre polynomials. The first algorithm, Legendre-Eleanor, archives the no-regret property under weaker assumptions but is computationally inefficient, whereas the second one, Legendre-LSVI, runs in polynomial time, although for a smaller class of problems. After analyzing their regret properties, we compare our results with state-of-the-art ones from RL theory, showing that our algorithms achieve the best guarantees.

Cite

Text

Maran et al. "No-Regret Reinforcement Learning in Smooth MDPs." International Conference on Machine Learning, 2024.

Markdown

[Maran et al. "No-Regret Reinforcement Learning in Smooth MDPs." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/maran2024icml-noregret/)

BibTeX

@inproceedings{maran2024icml-noregret,
  title     = {{No-Regret Reinforcement Learning in Smooth MDPs}},
  author    = {Maran, Davide and Metelli, Alberto Maria and Papini, Matteo and Restelli, Marcello},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {34760-34789},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/maran2024icml-noregret/}
}