Sketched Newton Value Iteration for Large-Scale Markov Decision Processes

Abstract

Value Iteration (VI) is one of the most classic algorithms for solving Markov Decision Processes (MDPs), which lays the foundations for various more advanced reinforcement learning algorithms, such as Q-learning. VI may take a large number of iterations to converge as it is a first-order method. In this paper, we introduce the Newton Value Iteration (NVI) algorithm, which eliminates the impact of action space dimension compared to some previous second-order methods. Consequently, NVI can efficiently handle MDPs with large action spaces. Building upon NVI, we propose a novel approach called Sketched Newton Value Iteration (SNVI) to tackle MDPs with both large state and action spaces. SNVI not only inherits the stability and fast convergence advantages of second-order algorithms, but also significantly reduces computational complexity, making it highly scalable. Extensive experiments demonstrate the superiority of our algorithms over traditional VI and previously proposed second-order VI algorithms.

Cite

Text

Liu et al. "Sketched Newton Value Iteration for Large-Scale Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I12.29301

Markdown

[Liu et al. "Sketched Newton Value Iteration for Large-Scale Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/liu2024aaai-sketched/) doi:10.1609/AAAI.V38I12.29301

BibTeX

@inproceedings{liu2024aaai-sketched,
  title     = {{Sketched Newton Value Iteration for Large-Scale Markov Decision Processes}},
  author    = {Liu, Jinsong and Xie, Chenghan and Deng, Qi and Ge, Dongdong and Ye, Yinyu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {13936-13944},
  doi       = {10.1609/AAAI.V38I12.29301},
  url       = {https://mlanthology.org/aaai/2024/liu2024aaai-sketched/}
}