The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis

Abstract

We study the sample complexity of the plug-in approach for learning $\varepsilon$-optimal policies in average-reward Markov decision processes (MDPs) with a generative model. The plug-in approach constructs a model estimate then computes an average-reward optimal policy in the estimated model. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed. Unlike the more well-studied discounted MDP reduction method, the plug-in approach requires no prior problem information or parameter tuning. Our results fill this gap and address the limitations of prior approaches, as we show that the plug-in approach is optimal in several well-studied settings without using prior knowledge. Specifically it achieves the optimal diameter- and mixing-based sample complexities of $\widetilde{O}\left(SA \frac{D}{\varepsilon^2}\right)$ and $\widetilde{O}\left(SA \frac{\tau_{\mathrm{unif}}}{\varepsilon^2}\right)$, respectively, without knowledge of the diameter $D$ or uniform mixing time $\tau_{\mathrm{unif}}$. We also obtain span-based bounds for the plug-in approach, and complement them with algorithm-specific lower bounds suggesting that they are unimprovable. Our results require novel techniques for analyzing long-horizon problems which may be broadly useful and which also improve results for the discounted plug-in approach, removing effective-horizon-related sample size restrictions and obtaining the first optimal complexity bounds for the full range of sample sizes without reward perturbation.

Cite

Text

Zurek and Chen. "The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis." Proceedings of The 36th International Conference on Algorithmic Learning Theory, 2025.

Markdown

[Zurek and Chen. "The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis." Proceedings of The 36th International Conference on Algorithmic Learning Theory, 2025.](https://mlanthology.org/alt/2025/zurek2025alt-plugin/)

BibTeX

@inproceedings{zurek2025alt-plugin,
  title     = {{The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis}},
  author    = {Zurek, Matthew and Chen, Yudong},
  booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  year      = {2025},
  pages     = {1386-1387},
  volume    = {272},
  url       = {https://mlanthology.org/alt/2025/zurek2025alt-plugin/}
}