AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

Abstract

In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The results show that our algorithm is highly efficient and achieves linear parallel speedup.

Cite

Text

Zeng et al. "AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity." Artificial Intelligence and Statistics, 2020.

Markdown

[Zeng et al. "AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/zeng2020aistats-asyncqvi/)

BibTeX

@inproceedings{zeng2020aistats-asyncqvi,
  title     = {{AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity}},
  author    = {Zeng, Yibo and Feng, Fei and Yin, Wotao},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {713-723},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/zeng2020aistats-asyncqvi/}
}