AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity
Abstract
In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The results show that our algorithm is highly efficient and achieves linear parallel speedup.
Cite
Text
Zeng et al. "AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity." Artificial Intelligence and Statistics, 2020.Markdown
[Zeng et al. "AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/zeng2020aistats-asyncqvi/)BibTeX
@inproceedings{zeng2020aistats-asyncqvi,
title = {{AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity}},
author = {Zeng, Yibo and Feng, Fei and Yin, Wotao},
booktitle = {Artificial Intelligence and Statistics},
year = {2020},
pages = {713-723},
volume = {108},
url = {https://mlanthology.org/aistats/2020/zeng2020aistats-asyncqvi/}
}