Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs

Abstract

We study the problem of infinite-horizon average-reward reinforcement learning with linear Markov decision processes (MDPs). The associated Bellman operator of the problem not being a contraction makes the algorithm design challenging. Previous approaches either suffer from computational inefficiency or require strong assumptions on dynamics, such as ergodicity, for achieving a regret bound of $\widetilde{\mathcal{O}}(\sqrt{T})$. In this paper, we propose the first algorithm that achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret with computational complexity polynomial in the problem parameters, without making strong assumptions on dynamics. Our approach approximates the average-reward setting by a discounted MDP with a carefully chosen discounting factor, and then applies an optimistic value iteration. We propose an algorithmic structure that plans for a nonstationary policy through optimistic value iteration and follows that policy until a specified information metric in the collected data doubles. Additionally, we introduce a value function clipping procedure for limiting the span of the value function for sample efficiency.

Cite

Text

Hong et al. "Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Hong et al. "Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/hong2025aistats-reinforcement/)

BibTeX

@inproceedings{hong2025aistats-reinforcement,
  title     = {{Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs}},
  author    = {Hong, Kihyuk and Chae, Woojin and Zhang, Yufan and Lee, Dabeen and Tewari, Ambuj},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {2989-2997},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/hong2025aistats-reinforcement/}
}