Deterministic Policy Gradient: Convergence Analysis
Abstract
The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an $\epsilon$-accurate stationary policy with a sample complexity of $\mathcal{O}(\epsilon^{-2})$. Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.
Cite
Text
Xiong et al. "Deterministic Policy Gradient: Convergence Analysis." Uncertainty in Artificial Intelligence, 2022.Markdown
[Xiong et al. "Deterministic Policy Gradient: Convergence Analysis." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/xiong2022uai-deterministic/)BibTeX
@inproceedings{xiong2022uai-deterministic,
title = {{Deterministic Policy Gradient: Convergence Analysis}},
author = {Xiong, Huaqing. and Xu, Tengyu and Zhao, Lin and Liang, Yingbin and Zhang, Wei},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2022},
pages = {2159-2169},
volume = {180},
url = {https://mlanthology.org/uai/2022/xiong2022uai-deterministic/}
}